Skip to main content

Semiparametric prediction models for variables related with energy production


In this paper a review of semiparametric models developed throughout the years thanks to an extensive collaboration between the Department of Statistics and Operations Research of the University of Santiago de Compostela and a power station located in As Pontes (A Coruña, Spain) property of Endesa Generation, SA, is shown. In particular these models were used to predict the levels of sulphur dioxide in the environment of this power station with half an hour in advance. In this paper also a new multidimensional semiparametric model is considered. This model is a generalization of the previous models and takes into account the correlation structure of errors. Its behaviour is illustrated in a simulation study and with the prediction of the levels of two important pollution indicators in the environment of the power station: sulphur dioxide and nitrogen oxides.

Introduction: an environmental problem

The coal-fired power station in As Pontes is one of the production centers owned by Endesa Generation SA in the Iberian Peninsula. It is located in the town of As Pontes de García Rodríguez, northeast of A Coruña province.

This power station was designed and built to make use of lignite from the mine located in its vicinity. This solid fuel was characterized by its high moisture and sulphur contents and its low calorific value. Throughout the years the plant has undergone several transformation processes in their facilities with the aim of reducing emissions of sulphur dioxide (\(\mathrm{SO}_{2}\)). The power station completed its last adaptation in 2008 to consume, as primary fuel, imported subbituminous coal, characterized by its low sulphur and ash contents.

The location of the power plant close to natural sites of high ecological value, such as the Natural Park As Fragas do Eume and existing legislation, mean that it has existed since the beginning a great concern for its impact on the environment. Therefore the station has a Supplementary Control System of Air Quality that allows it to make changes in operating conditions in order to reduce emissions when the weather conditions are adverse to the spread of the emitted smoke plume, specifically containing \(\mathrm{SO}_{2}\), and there are significant episodes of impaired air quality. Spanish law, by rules and regulations, sets maximum concentrations that can be achieved for these gases in a given period of time. In particular, for this plant the only limit that might be exceeded at any time, is one that is established on the hourly mean (continuously computed) from the concentration of \(\mathrm{SO}_{2}\) in the soil, the value of 350 μg/m3.

The problem is to be able to predict, using the information received continuously at sampling stations and the past information, the future values for \(\mathrm{SO}_{2}\) levels. Statistical forecast models are the key to get these predictions and suggest a course of action to the plant operators.

In recent years, new statistical models have been designed to obtain the simultaneous prediction of two pollution indicators in the environment due to the changes in the environmental legislation, in the power station itself, and the construction of a new natural gas combined cycle station in the vicinity. The fuels that are going to be used make that the main interest lies in predicting the values of the nitrogen oxides (\(\mathrm{NO}_{x}\)) which is emitted by both facilities simultaneously with the values of \(\mathrm{SO}_{2}\) which is only emitted by the power station.

All these changes have created a new problem: predicting hourly mean concentrations of sulphur dioxide and nitrogen oxides, measured in the environment of the two facilities. Faced with this new approach, the statistical forecast models are again an effective tool. Thus, a multidimensional prediction general model is designed (see Sect. 3).

Methods: one-dimensional predictive models

Models designed to solve the environmental problem

Resulting from the collaboration over the past years between the Department of Statistics and Operations Research at the University of Santiago de Compostela and the Environment Section of the power station, the Integrated System of Statistical Prediction of the Immision (SIPEI, in Spanish) have been created employing statistical models to provide predictions for the levels of \(\mathrm{SO}_{2}\) with a half an hour horizon.

Due to data availability with minutal frequency in real-time and current legislation, the hourly mean is considered from both of the values of \(\mathrm{SO}_{2}\) and \(\mathrm{NO}_{x}\), for predictions of future values of both pollutants. Thus, two time series are constructed, \(X_{1,t}\) and \(X_{2,t}\), for which the subscript t represents a minutal instant, and each value will be an average of the actual values for the last hour:

$$X_{1,t}=\frac{1}{60}\sum_{i=0}^{59}{ \mathrm{SO}_{2}(t-i)}\quad \mbox{and}\quad X_{2,t}= \frac{1}{60}\sum_{i=0}^{59}{ \mathrm{NO}_{x}(t-i)}, $$

where \(\mathrm{SO}_{2}(t)\) and \(\mathrm{NO}_{x}(t)\) represent the concentration of \(\mathrm{SO}_{2}\) and \(\mathrm{NO}_{x}\), respectively, at time t, measured in \(\mu g/m^{3}\).

The series of hourly \(\mathrm{SO}_{2}\) means has a characteristic behaviour, highly influenced by weather conditions and local topography. It takes values close to zero for long periods of time, and it can suddenly and sharply increase (episodes) in bad meteorological conditions for the dispersion of the smoke plume. Nowadays, the series of hourly \(\mathrm{NO}_{x}\) means has a similar behaviour to that of \(\mathrm{SO}_{2}\), but on a smaller scale (see Fig. 1). The main objective of the developed statistical models is to predict the episodes, so our interest is centred on the values that occur less frequently along the time series.

Figure 1

Episode depicted in one of sampling stations. The one hour mean of \(\mathrm{SO}_{2}\) and \(\mathrm{NO}_{x}\) are, respectively, drawn in red and orange

Because of this, a kind of memory called Historical Matrix was designed (Prada-Sánchez and Febrero-Bande [14]), which will be essential to the behaviour of all developed models so far. This matrix is composed of a large number of vectors based on \((X_{t-l},\ldots,X_{t},X_{t+k})\): real data of bihourly \(\mathrm{SO}_{2}\) or \(\mathrm{NO}_{x}\) means, chosen so as to cover the full range of variable in question and make the role of historical memory. To ensure that cover the entire range of the variable, the matrix is divided into blocks according to the level of the response variable, \(X_{t+k}\). To update the memory, in every instant, when a new observation is received, the historical matrix is renewed in the following way: the class to which the new observation belongs is found and then the oldest datum in such class leaves the matrix and the new observation enters it. With a sample built this way, makes sure that always have updated information on the full variation range of the interest variable, and over the years this concept has been adapted to the different statistical techniques used.

The first semiparametric model

In the early years of development, the data transmission frequency to SIPEI was pentaminutal, and also, the legislation in force at that time established the limit values for the two hour mean of the \(\mathrm {SO}_{2}\). For this reason, the prediction models for \(\mathrm{SO}_{2}\) levels initially worked with series of bihourly means. The objective was to obtain the prediction, with a half an hour horizon, for this time series. Therefore, each time it receives a new observation, \(X_{t}\), it has to predict the value at six times ahead, \(X_{t+6}\).

A semiparametric approach was considered (García-Jurado et al. [8]) which generalizes the traditional Box–Jenkins models as follows:

$$X_{t+\kappa}=\varphi_{\kappa}(X_{t},X_{t-l})+Z_{t+\kappa}, \quad \kappa ,l\in\mathbb{Z}^{+}, $$

where \(Z_{t}\) has an ARIMA structure of mean zero independent of \(X_{t}\) (Box et al. [1]).

In particular at each time t, the regression function \(\varphi _{6}(X_{t},X_{t-1})=\mathbb{E}(X_{t+6}/X_{t},X_{t-1})\) is estimated with the well-known Nadaraya–Watson kernel type estimator (see Nadaraya [13] and Watson [19]) using the information provided by the historical matrix. The second step is to calculate the residual time series \(\hat {Z}_{t-64},\ldots,\hat{Z}_{t}\) relative to the last six hours, where \(\hat{Z}_{i}=X_{i}-\hat{\mathbb{E}}(X_{i}/X_{i-6},X_{i-7})\) for each i and fits an appropriate ARIMA model for it. Finally we get the Box–Jenkins prediction of \(\hat{Z}_{t+6}\). The final point prediction proposed is given by: \(\hat{\mathbb{E}}(X_{t+6}/X_{t},X_{t-1})+\hat{Z}_{t+6}\).

Partially linear model

The information used by the previous semiparametric models to obtain the predictions is the past of the time series; however it might be useful to introduce additional information in order to improve these predictions. Specifically, meteorological and emission variables have been used with, the so-called partially linear models (Prada-Sánchez et al. [15]) to estimate bihourly mean values of \(\mathrm{SO}_{2}\) with one hour in advance.

Data in the form of \((V_{t},Z_{t},Y_{t})\) is considered, where \(V_{t}\) is a vector of exogenous variables, \(Z_{t}=(X_{t},X_{t-l})\) and \(Y_{t}=X_{t+12}\) being \(X_{t}\) the series of bihourly \(\mathrm{SO}_{2}\) means; and it is assumed that this series conform to the following partially linear model: \(Y_{t}=V^{t}_{t}\beta+\varphi(Z_{t})+\epsilon_{t}\), where \(\epsilon_{t}\) is an error term of mean equals to zero.

This model can easily estimated following Speckman [18] and allow us to extend the horizon to one hour maintaining the same level of accuracy as the semiparametric model for half an hour horizon. In any case, the incorporation of external information slightly improves the prediction because the measure point for the meteorological variables is located at 80 m over ground level which is relatively far away (and so, uncorrelated) respect to the typical height of the emitted smoke plume (above 800 m over ground level). Emission information is also of little interest because these signals are almost constant specially when the facility is working not describing at all the reasons that make the smoke plume falls to the ground. By these reasons, meteorological or emission information was not considered in the following models.

Neural networks

The change in the interest series established by the European Council Directive 1999/30/CE, from bihourly means to hourly means, causes the time series to be less smooth. At the beginning, the previous semiparametric model was adapted to work on the new series of hourly means. The results showed a considerable increase in terms of the variability of the given predictions, regarding the results usually obtained for the series of two hour means.

In an attempt to improve the response given by the SIPEI, and in particular, its point predictions with half an hour horizon, new predictors based on neural networks models were developed (Fernández de Castro et al. [6]).

A neural network model has been designed to provide predictions of one hour mean values of \(\mathrm{SO}_{2}\) with half an hour in advance. It consists of an input layer, one hidden layer and an output layer. The number of nodes in the output layer is determined by the size of the response to be obtained from the network; in this case interested in a prediction for \(X_{t+6}\). As input to the network it has been taken the bidimensional vector \((X_{t-3},X_{t})\) and the nodes in the hidden layer have been taken as the activation function of a logistic function, and in the output layer, the identity function.

The predictor given by the neural network has the following expression:

$$\hat{X}_{t+6}=o_{1}=\sum_{j=1}^{L}{ \omega_{1j}^{o}f_{j}^{h} \bigl( \theta_{j}^{h}+\omega _{j1}^{h}X_{t-3}+ \omega_{j2}^{h}X_{t} \bigr)} $$

with \(f_{j}^{h}(z)=\frac{1}{1+e^{-z}}\).

The weights \(\{\omega_{j1}^{h},\omega_{j2}^{h},\omega_{1j}^{o}; j=1,\ldots ,L\}\) and the trends \(\{\theta_{j}^{h}; j=1,\ldots,L\}\) are determined during the training process, as well as the final L number of hidden layer nodes, that is chosen like the value which neural network provides better results, after having trained networks with identical architecture and different values of L. To design the training set of the neural network it have been considered historical matrices, formerly introduced, suitably adapted.

Figure 2 shows the forecasts given half an hour before by the neural network with 50 nodes in its hidden layer for an episode depicted in one of the measuring stations. The good behaviour of the forecast (dotted line) can easily be seen. The procedures based on neural networks accurately predict the real one hour mean \(\mathrm {SO}_{2}\) air quality values (solid line). These models were optimized later with boosting learning techniques (Fernández de Castro and González-Manteiga [4]).

Figure 2

Episode of \(\mathrm{SO}_{2}\) depicted in one of sampling stations (red) jointly with the prediction provided by the neural network (blue) (Fernández de Castro et al. [6])

Functional data model

The one hour mean values of \(\mathrm{SO}_{2}\) can be treated as observations of a stochastic process in continuous time. The interest is, as it was discussed above, to predict a half-hour horizon, so that each of the curves is an interpolated data on half an hour. In this case curves were obtained by considering six pentaminutal consecutive observations, with sampling points for each functional data. Therefore, we use random variables with values in Hilbert space \(H=L^{2}([0,6])\) with the form \(X_{t}(u)=x(6t+u)\).

The following statistical model is considered \(X_{t}=\rho (X_{t-1})+\epsilon_{t}\), where \(\epsilon_{t}\) is a Hilbertian strong white noise and \(\rho:H\to H\) is the operator to estimate. For the estimation of ρ, a functional kernel estimator has been used in the autoregressive Hilbertian of order-one framework. Furthermore, it has been conveniently adapted the concept of historical matrix to the case where the data are curves (Fernández de Castro et al. [5]).

Other approaches designed to predict probabilities

The models described, so far, provide point predictions of \(\mathrm {SO}_{2}\), but other techniques have also been developed in order to predict probabilities. The aim of these alternative models is to estimate the probability that the series of bihourly \(\mathrm{SO}_{2}\) measures exceeds a certain level r with an hour anticipation, namely in our case, we predict \(\mathbb{P} (Z_{t} )=\mathbb{P} (X_{t+12}>r|Z_{t} )\) being \(Z_{t}= (X_{t},X_{t}-X_{t-3} )\). To do it additive models with an unknown link function (Roca-Pardiñas et al. [17]) have been used.

It has also been considered more complex generalized additive models (GAM) with second-order interaction terms (Roca-Pardiñas et al. [16]). They have shown that the GAM with interactions detects the onset of episodes earlier than it does GAM on its own.

Alternative one-dimensional models: additive models

In the statistical literature there is a wide range of one-dimensional models which can be used to predict the levels of \(\mathrm{SO}_{2}\). We will focus on the techniques we will use in the next section to construct our multidimensional model: additive models for continuous response.

There have been a number of proposals for fitting the additive models. Friedman and Stuetzle [7] introduced a backfitting algorithm and Buja et al. [2] studied its properties. Mammen et al. [12] proposed the so called smooth backfitting by employing projection arguments. Let \(\{ (Y_{t},Z_{t}) \}_{t=1}^{T}\) be a random sample of a strictly stationary time series, with \(Y_{t}\) one-dimensional and \(Z_{t}\) q-dimensional following the model:

$$ Y_{t}=m (Z_{t})+\epsilon_{t},\quad t\in\mathbb{Z}, $$

where \(\{\epsilon_{t} \}\) is a white noise process and \(\mathbb{E}[\epsilon_{t}|Z_{t}]=0\).

Typically, it is assumed that the function m is additive with component functions \(m_{j}\), for \(j=0,\ldots,q\), thus

$$ Y_{t}=m_{0}+m_{1}(Z_{1,t})+ \cdots+m_{q}(Z_{q,t})+\epsilon_{t}. $$

A generalized kernel nonparametric estimation can be given using smooth backfitting for the functions \(m_{1},\ldots,m_{q}\) (see again the above mentioned papers).

In all the models described above it is usually necessary the selection of a regularization parameter (bandwidth with kernel smoothing, number of neurons in the hidden layer for neural networks, …). The calibration of this parameter was developed using cross-validation techniques with the information of the updated Historical Matrix.

Methods: multidimensional semiparametric prediction

The new goal is to incorporate the prediction of \(\mathrm{NO}_{x}\) with half an hour in advance, as well as to continue getting the predictions of \(\mathrm{SO}_{2}\), as has already been commented. The idea is to generalize the one-dimensional semiparametric approach proposed by García-Jurado et al. [8] taking into account the structure of correlation between the vectorial series that is intended to predict.

The model

Be \((Y, Z )= (Y_{l}, Z_{l} )\), \(l=0,\pm1,\pm 2,\ldots\) a vectorial strictly stationary time series, where \(Y_{l}\) is a r-dimensional response series and \(Z_{l}\) is a q-dimensional covariables series and, let \(\{(Y_{t},Z_{t}) \}_{t=1}^{T}\) be a random sample of \((Y,Z )\). The following model is considered

$$ Y_{t}=\varphi(Z_{t})+\mathcal{E}_{t}, $$

where \(Y_{t}= (Y_{1,t},\ldots,Y_{r,t} )^{t}\), \(Z_{t}= (Z_{1,t},\ldots,Z_{q,t} )^{t}\) and \(\mathcal{E}_{t}= (\mathcal {E}_{1,t},\ldots,\mathcal{E}_{r,t} )^{t}\). Let us consider two possible structures for the multidimensional residuals series:


Each \(\mathcal{E}_{k,t}\) is a stationary AR(\(p_{k}\)) process of the form

$$\mathcal{E}_{k,t}=\sum_{i=1}^{p_{k}}{ \phi_{k}^{i}\mathcal {E}_{k,t-i}+\xi_{k,t}} \quad \mbox{for all } t\in\mathbb{Z}, k=1,\ldots,r $$

independent of \(Z_{t}\), where \(\xi_{k,t}\) is a white noise process with variance \(\sigma_{k}^{2}\), for \(k=1,\ldots,r\).


\(\mathcal{E}_{t}\) has a VAR(p) structure of the form

$$\mathcal{E}_{t}=\sum_{i=1}^{p}{ \Phi_{i}\mathcal{E}_{t-i}+\xi_{t}} \quad \mbox{for all } t\in\mathbb{Z}, $$

independent of \(Z_{t}\), where the \(\Phi_{i}\) are fixed (\(r\times r\)) coefficients matrices and \(\xi_{t}\) is a r-dimensional white noise process, i.e. \(\mathbb{E}(\xi_{t})=0\), \(\mathbb{E}(\xi_{t} \xi '_{t})=\Sigma_{\xi}\) and \(\mathbb{E}(\xi_{t} \xi'_{s})=0\) for \(t\neq s\).

Our main objective is to predict \(Y_{t}\) using a sample of size T, κ instants ahead. The prediction of \(Y_{t+\kappa}\) is then defined by

$$ \dot{Y}_{t+\kappa}=\hat{\varphi}_{\kappa}(Z_{t})+ \dot{\mathcal {E}}_{t+\kappa}, $$

where \(\hat{\varphi}_{\kappa}(Z_{t})\) is a nonparametric estimate of \(\varphi_{\kappa}(Z_{t})=\mathbb{E} [Y_{t+\kappa}/Z_{t} ]\) and \(\dot{\mathcal{E}}_{t+\kappa}\) the prediction given, κ instants ahead, for the residual series constructed as \(\hat{\mathcal {E}}_{t+\kappa}=Y_{t+\kappa}-\hat{\varphi}_{\kappa}(Z_{t})\).


We suppose that the model (3) is verified. The first step is to make a nonparametric estimation of φ independently for each of the r components of \(Y_{t}\): \({\varphi}(Z_{t})=({\varphi}_{1}(Z_{t}),\ldots ,{\varphi}_{r}(Z_{t}))\). Furthermore, we assume that the functions \(\varphi_{k}\) are additive with component functions \(\varphi_{k}^{j}\), for \(k=1,\ldots,r\) and \(j=0,\ldots,q\), thus

$$ {\varphi}_{k}(Z_{t})={\varphi}_{k}^{0}+{ \varphi}_{k}^{1}(Z_{1,t})+\cdots +{ \varphi}_{k}^{q}(Z_{q,t}),\quad k=1,\ldots,r. $$

Therefore, r additive models with q covariates are estimated using the smooth backfitting technique. We have to take into account that the process \(\mathcal{E}_{t}\) is not observable since the function φ is not known. Thus, we have to replace \(\mathcal{E}_{t}\) by the residuals

$$\hat{\mathcal{E}}_{t}=Y_{t}-\hat{\varphi}(Z_{t}) $$

and use these approximations to \(\mathcal{E}_{t}\) in the maximum likelihood estimations later defined.

To estimate the parametric part of the model, we must consider the two possible error structures proposed above:


The parameters \(\phi_{k}=(\phi_{k}^{1},\ldots,\phi_{k}^{p_{k}})\) of the error process \(\{\mathcal{E}_{k,t} \}\) are estimated by standard maximum likelihood methods. In particular, we use a conditional maximum likelihood estimator for every component of the form

$$\hat{\phi}_{k}=\operatorname{arg} \max_{\phi_{k}\in\Phi} \hat{l}(\phi_{k}), $$

where Φ is a compact parameter space and is the conditional log-likelihood given by

$$\hat{l} \bigl(\phi_{k},\sigma_{k}^{2} \bigr)=- \frac{T}{2}\log(2\pi)+\frac {1}{2}\log \bigl(\sigma_{k}^{-2} \bigr)-\frac{1}{2}\sum_{t=p_{k}+1}^{T} \bigl( \bigl(\hat{\mathcal{E}}_{k,t}-\hat{\mathcal {E}}_{k,t}( \phi_{k}) \bigr)/\sigma_{k} \bigr)^{2} $$

with \(\hat{\mathcal{E}}_{k,t}(\phi_{k})=\sum_{i=1}^{p_{k}}{\phi _{k}^{i}\hat{\mathcal{E}}_{k,t-i}}\).


The coefficients matrices \((\Phi_{1},\ldots,\Phi_{p})\) of the r-dimensional error process \(\{\mathcal{E}_{t} \}\) are also estimated by generalized maximum likelihood methods (Hamilton [10]). First, we need to establish the following notation: \(\Phi^{t} = [ \Phi_{1}\:\Phi_{2}\: \ldots\: \Phi_{p} ]\) denote the \((r\times rp)\) coefficients matrix, let \(X_{t}\) be a \((rp\times1)\) vector containing p lags of each of the elements of \(\mathcal{E}_{t}\): \(X_{t}^{t}=[\mathcal {E}^{t}_{t-1}\:\mathcal{E}^{t}_{t-2}\:\ldots\:\mathcal{E}^{t}_{t-p}]\).

The theoretical conditional log-likelihood function to be optimized has the following expression:

$$l(\Phi, \Sigma_{\xi})=-\frac{rT}{2}\log(2\pi)+\frac{r}{2} \log \bigl\vert \Sigma_{\xi}^{-1} \bigr\vert - \frac{1}{2}\sum_{t=1}^{T}{ \bigl[ \bigl(\mathcal{E}_{t}-\Phi^{t}X_{t} \bigr)^{t}\:\Sigma_{\xi}^{-1} \bigl( \mathcal{E}_{t}-\Phi^{t}X_{t} \bigr) \bigr]}. $$

Thus the conditional log-likelihood is:

$$\hat{l}(\hat{\Phi}, \hat{\Sigma}_{\xi})=-\frac{rT}{2}\log(2\pi )+ \frac{r}{2}\log \bigl\vert \hat{\Sigma}_{\xi}^{-1} \bigr\vert -\frac {1}{2}\sum_{t=1}^{T}{ \bigl[ \bigl(\hat{\mathcal{E}}_{t}-\hat{\Phi}^{t}\hat {X}_{t} \bigr)^{t} \: \hat{\Sigma}_{\xi}^{-1} \bigl(\hat{\mathcal {E}}_{t}-\hat{\Phi}^{t} \hat{X}_{t} \bigr) \bigr]}. $$

Other considerations: the phenomenon of cointegration

Sometimes the vectorial processes can be cointegrated, so one has to take into account the structure of correlation between the series. The notion of cointegration has been one of the most important concepts in time series since Granger [9] and Engle and Granger [3] that formally developed it. The issue has broad applications in the analysis of economic data as well as several publications in the economic literature.

Let \(Y_{t}=(Y_{1,t},\ldots,Y_{r,t})^{t}\) be a vector of r time series integrated of order 1 (\(I(1)\)). \(Y_{t}\) is said to be cointegrated if a linear combination of them exists that it is stationary (\(I(0)\)), i.e., if there exists a vector \(\beta=(\beta _{1},\ldots,\beta_{r})^{t}\) such as

$$\beta^{t}Y_{t}=\beta_{1}Y_{1,t}+\cdots+ \beta_{r}Y_{r,t}\sim I(0). $$

The vector β is called the cointegration vector. This vector is not unique since for any scalar c the linear combination \(c\beta^{t}Y_{t}=\beta^{*t}Y_{t}\sim I(0)\). Therefore, normalization is often assumed to identify an unique β. A typical normalization is \(\beta=(1,-\beta_{2},\ldots,-\beta_{r})^{t}\).

Johansen [11] addresses the issue of the cointegration within an error correction model in the framework of vector autoregressive models (VAR). Consider then a general model VAR(p) for the vector of r series \(Y_{t}\)

$$Y_{t}=\Phi_{0} D_{t}+\Phi_{1}Y_{t-1}+ \cdots+\Phi_{p}Y_{t-p}+\xi_{t},\quad t=1,\ldots,T, $$

where \(D_{t}\) contains deterministic terms (constant, trend, …).

Suppose \(Y_{t}\) is \(I(1)\) and possibly cointegrated. Then, the VAR representation is not the most suitable representation for analysis because the cointegrating relationships are not explicitly apparent. The cointegrating relationships become apparent if the VAR model is transformed to a vector error correction model of order p (VECM(p))

$$\Delta Y_{t}=\Phi_{0} D_{t}+\Pi Y_{t-1}+ \Gamma_{1}\Delta Y_{t-1}+\cdots +\Gamma_{p-1}\Delta Y_{t-p+1}+\xi_{t}, $$

where \(\Pi=\Phi_{1}+\cdots+\Phi_{p}-I_{r}\), \(\Gamma_{k}=-\sum_{j=k+1}^{p}{\Phi_{j}}\), \(k=1,\ldots,p-1\) and \(\Delta Y_{t}=Y_{t}-Y_{t-1}\). The matrix Π is called the long-run impact matrix and \(\Gamma_{k}\) are the short-run impact matrices. Moreover, the rank of the singular matrix Π provides information on the number of cointegration relations that exist, i.e., the rank of cointegration. Johansen proposes a sequential procedure of likelihood ratio tests to estimate this range.

Prediction scheme

We present now the prediction scheme step by step:

  1. 1.

    Every instant t, \(\varphi_{\kappa}(Z_{t})\) is estimated with the smooth backfitting technique independently for each of r components using the data \((Y_{l},Z_{l-\kappa})\), \(l=\kappa+1,\ldots,T\).

  2. 2.

    The residuals series \(\hat{\mathcal{E}}_{t+\kappa}\) is computed by

    $$\hat{\mathcal{E}}_{t+\kappa}=Y_{t+\kappa}-\hat{\varphi}_{\kappa }(Z_{t}), \quad t=1,\ldots,T-\kappa. $$
  3. 3.

    The following step is to make an appropriate adjustment on the model error structure (VECM) and to obtain the prediction κ instants ahead: \(\dot{\mathcal{E}}_{T+\kappa}\).

  4. 4.

    The proposed final prediction is given by (4).

This scheme is a natural generalization of the one-dimensional prediction models described in Sect. 2.1.1. In the next two sections simulation examples and real data analysis are considered.

Results and discussion

A simulation study

To analyze the behavior of the proposed prediction procedure, a simulation study has been performed generating samples from artificial series and making a prediction study to k lags using, in all cases, \(Z_{t}=Y_{t-1}\).

The following models are considered:

Series 1. :

Two independent AR(3) with constant trend:

$$Y_{t}=\varphi+ \begin{pmatrix} \mathcal{E}_{1,t}\\ \mathcal{E}_{2,t} \end{pmatrix}, $$


$$\begin{aligned} &\mathcal{E}_{1,t}=0.50\mathcal{E}_{1,t-1}-0.525\mathcal {E}_{1,t-2}+0.75\mathcal{E}_{1,t-3}+\eta_{1,t}, \\ &\mathcal{E}_{2,t}=0.1875\mathcal{E}_{2,t-1}-0.50\mathcal {E}_{2,t-2}+0.05\mathcal{E}_{2,t-3}+\eta_{2,t}, \end{aligned}$$

where \(\eta_{1,t}\sim N (0,0.25^{2} )\), \(\eta_{2,t}\sim N (0,0.10^{2} )\) and \(\varphi=(25,10)^{t}\).

Series 2. :

VAR(3) with constant trend:

$$Y_{t}=\varphi+\Pi_{1}(Y_{t-1}-\varphi)+ \Pi_{2}(Y_{t-2}-\varphi)+\Pi _{3}(Y_{t-3}- \varphi)+\eta_{t}, $$

being Π 1 = ( 0.50 0.3150 0.75 0.1875 ) , Π 2 = ( 0.525 0 0 0.50 ) , Π 3 = ( 0.75 0.375 0.50 0.050 ) , η t N 2 ( ( 0 0 ) , ( 0.25 2 0 0 0.10 2 ) ) and φ as in Series 1.

Series 3. :

NPVAR(1) with independent VAR(3) noise:

$$Y_{t}=\varphi (Y_{t-1} )+\mathcal{E}_{t}, $$

being φ(y)= ( φ 1 ( y ) φ 2 ( y ) ) = ( 5 cos ( | y 1 | ) 5 cos ( | y 2 | ) ) and \(\mathcal{E}_{t}=\Pi_{1}\mathcal{E}_{t-1}+\Pi_{2}\mathcal {E}_{t-2}+\Pi_{3}\mathcal{E}_{t-3}+\eta_{t}\), where \(\Pi_{1}\), \(\Pi_{2}\), \(\Pi_{3}\) and \(\eta_{t}\) are similar to those of the previous series.

Series 4. :

VECM with constant trend:

$$Y_{t}=\varphi+ \begin{pmatrix} Y_{1,t}\\ Y_{2,t} \end{pmatrix}, $$

being \(Y_{1,t}=Y_{1,t-1}+v_{t}\) and \(Y_{2,t}=-Y_{1,t}+u_{t}\), where \(v_{t}\sim N (0,0.5^{2} )\), \(u_{t}=0.75 u_{t-1}+\eta_{t}\), \(\eta _{t}\sim N (0,0.5^{2} )\) and φ is similar to that of the first series.

In each case, 1000 bidimensional series of length 500 were generated from the models given above (\(Y_{1}^{i},\ldots,Y_{500}^{i}\) with \(1\leq i\leq1000\)). These values correspond to the generation after an initial period of stabilization (starting at zero and neglecting the first 500 values drawn). For every sample, \(M=500\) possible continuations of the series were obtained for k periods ahead \((Y_{500+k}^{i1},\ldots,Y_{500+k}^{i500})\), which were compared with the prediction that was made from the sample \(Y_{1}^{i},\ldots,Y_{500}^{i}\).

For each of these series, three predictors are compared:

  1. (a)

    The nonparametric predictor using additive models with the estimation of each component independently (NPM).

  2. (b)

    The semiparametric predictor for additive models to estimate the trend of each bidimensional series component independently with model P1 for the residuals (SPM).

  3. (c)

    The semiparametric predictor for additive models to estimate the trend of each bidimensional series component independently with VAR modelling for the vector residuals proposed in the previous section as P2 (SPBM).

Thus, as noted above, by calling \(Y_{1}^{i},\ldots,Y_{500}^{i}\), \(i=1,\ldots ,N=1000\), each of the simulated series and, considering \(\hat {Y}_{500+k}^{i(a)}\), \(\hat{Y}_{500+k}^{i(b)}\) and \(\hat {Y}_{500+k}^{i(c)}\), \(k=1,\ldots,30\) as each of the predictors according to the methods (a), (b) and (c) respectively, methods are compared using Mean Square Prediction Errors:

$$ \operatorname{MSPE}(l)=\frac{1}{N}\sum_{i=1}^{N}{ \frac{1}{M}\sum_{j=1}^{M}{ \bigl(Y_{500+k}^{ij}-\hat{Y}_{500+k}^{i(l)} \bigr)^{2}}}, $$

where \(Y_{500+k}^{ij}\) represents the observed value of the jth prolongation of the ith series, \(j=1,\ldots,M=500\), \(l=a, b\) or c and \(k=1,\ldots,30\).

The results are summarized in Tables 1 to 4. “\(\operatorname{MSPE}(a,b,c)\)” means the mean square prediction error (6) for the methods (a), (b) and (c), respectively. It can be seen that the proposed semiparametric method improves the behavior the other two, specially in the first lags and as the lags grow the differences between the three methods become smaller. This is illustrated in Fig. 3 which compares the distribution of the prediction errors obtained with three predictors for the second model.

Figure 3

Boxplots of the prediction errors obtained with three predictors for the pure VAR model. (a) nonparametric predictor for each component independently, (b) semiparametric predictor for each component independently and (c) semiparametric predictor with VAR modelling of the residuals

Table 1 AR independent model (Series 1)

Real data application

The general model proposed in Sect. 3.1 was implemented for the particular case of the prediction of levels of \(\mathrm{SO}_{2}\) and \(\mathrm{NO}_{x}\) in the vicinity of power station and combined cycle.

Let \(X_{t}\) be the bidimensional series formed by the one hour mean series of \(\mathrm{SO}_{2}\) and \(\mathrm{NO}_{x}\) at each minute t. In terms of equation (3), we consider \(Y_{t}=X_{t+\kappa}\) and \(Z_{t}= (X_{t},X_{t}-X_{t-5} )\). If \(\hat{X}_{i}\) denotes the observed values for past instants (\(i\leq t\)) and the best prediction for future instants (\(i>t\)), the aim is to predict \(X_{t+30}\) following the next algorithm:

  • Every instant t, \(\varphi (Z_{t} )\) is estimated with additive models and the information provided by the historical matrix, independently for each component. The estimate of φ is done at 30 instants ahead: \(\dot{Y}_{t}=\dot{X}_{t+30}=\hat{\varphi }_{30} (Z_{t} )+\dot{e}_{t+30}\).

  • The residuals series \(\hat{e}_{t}\) is computed by \(\hat {e}_{t}=Y_{t}-\hat{\varphi}_{30} (Z_{t} )\) and a test of model adequacy is performed (for instance, the Ljung–Box test) for each component of the series concerning the last four hours (240 observations).

  • If any of the components of the residuals series is not white noise, a test is performed to explore if the vectorial residual series is cointegrated. If this is the case, an adequate VECM is adjusted. If the series is not cointegrated, a VAR model is fitted.

  • Thus \(\dot{e}_{t+30}\) is obtained.

  • The proposed final prediction given by the Semiparametric Bidimensional Model with the nonparametric part estimated at 30 instants (SPBM) is:

    $$\dot{X}_{t+30}=\hat{\varphi}_{30} (Z_{t} )+ \dot{e}_{t+30}. $$

To observe the behaviour of the prediction model, we have evaluated its performance on two episodes of air quality alteration, whose information has not been included in the historical matrix.

Figure 4 shows the forecasts given half an hour before by the proposed models for an episode depicted in one of sampling stations. The good behaviour of the forecasts can easily be seen. The proposals estimate quite well the real one hour mean of \(\mathrm{SO}_{2}\) and \(\mathrm{NO}_{x}\) values. This is confirmed in Table 5. This table contains three measures of accuracy for the pure nonparametric predictor (NPM) and the proposed semiparametric predictor (SPBM), based on the following criteria:

Figure 4

Episode depicted in one of the sampling stations. Predictions given by the bidimensional semiparametric models (NPM and SPBM) for the one hour \(\mathrm{SO}_{2}\) mean

  1. (a)

    Squared error: \(\textrm{SE}=\sum_{t} (y_{t}-\hat {y}_{t} )^{2}\).

  2. (b)

    Absolute error: \(\textrm{AE}= \vert y_{t}-\hat{y}_{t} \vert \).

  3. (c)

    Relative absolute error (%): \(\textrm{RAE}=100{ \vert \frac{y_{t}-\hat{y}_{t}}{y_{t}} \vert }\).

The mean (M) and the median (Md) of these three measures have been computed for the period covering the pollution incident proper (02.00 to 10.00 hours). The \(\mathrm{SO}_{2}\) nonparametric prediction with the historical matrix captures very well the behaviour of the real series (RAE: 24.18%) while the semiparametric prediction is not able to overcome (RAE: 27.15%). However, the \(\mathrm{NO}_{x}\) prediction given by SPBM (RAE: 21.35%) notably improves one obtained by the NPM (RAE: 29.77%). Furthermore, the residuals series was detected as cointegrated 123 times (8.37%), mainly when the episode higher values occur.

In another \(\mathrm{SO}_{2}\) episode depicted in one of the sampling stations (see Fig. 5) the behaviour of the predictors is somewhat different. The \(\mathrm{SO}_{2}\) prediction given by NPM (RAE: 43.92%) does not entirely capture the behaviour of the real series and so, the semiparametric prediction (RAE: 38.48%) can improve that results as shown in Table 6. In this episode, the \(\mathrm {NO}_{x}\) values are very low (practically zero) and therefore there are no cointegration relationships.

Figure 5

Episode depicted in one of sampling stations. Predictions given by the bidimensional semiparametric models for the one hour \(\mathrm{SO}_{2}\) (left) and \(\mathrm{NO}_{x}\) (right) means


This paper reviews several prediction models that have been implemented along the years for the prediction of \(\mathrm{SO}_{2}\) in the vicinity of a power station. This evolution reflects the adaptation of the statistical models to the change of improved environmental rules and the availability of new technological resources that allows the estimation in more complex situations.

The last part of the paper is devoted to a new proposal that, having in mind the same philosophy applied to the previous univariate models, extends the semiparametric model to the multivariate framework. In particular, the paper deals with the joint prediction of \(\mathrm {SO}_{2}\) and \(\mathrm{NO}_{x}\) levels using natural extensions of the model in the univariate framework. These models, originally developed for financial applications, are successfully adapted to the environmental problem showing good results in the simulation studies and in the real data application. The semiparametric joint predictor (SPBM) obtains similar results as the nonparametric (NPM) and the semiparametric independent predictor (SPM) in those scenarios where the components of the response are not related (see Table 1). Recall that predictors NPM and SPM are constructed under this assumption. In the scenarios with dependence among components, predictor SPBM clearly beats its competitors (see Tables 24) showing also good results in the real data application.

Table 2 VAR(3) model (Series 2)
Table 3 Model NPAR(1) with VAR(3) noise (Series 3)
Table 4 Pure VECM model (Series 4)
Table 5 \(\mathrm{SO}_{2}\) and \(\mathrm{NO}_{x}\) Forecast errors
Table 6 \(\mathrm{SO}_{2}\) Forecast errors



Integrated System of Statistical Prediction of the Inmision


Autoregressive Integrated Moving Average


Vector Error Correction Model


Nonparametric Model


Semiparametric Model


Semiparametric Bidimensional Model


Mean Square Prediction Error


  1. 1.

    Box G, Jenkins M, Reinsel C. Time series analysis: forecasting and control. New York: Wiley; 2008.

    Google Scholar 

  2. 2.

    Buja A, Hastie T, Tibshirani R. Linear smoothers and additive models. Ann Stat. 1989;17:453–510.

    MathSciNet  Article  MATH  Google Scholar 

  3. 3.

    Engle RF, Granger CWJ. Co-integration and error correction: representation, estimation and testing. Econometrica. 1987;57:251–76.

    MathSciNet  Article  MATH  Google Scholar 

  4. 4.

    Fernández de Castro B, González-Manteiga W. Boosting for real and functional samples: an application to an environmental problem. Stoch Environ Res Risk Assess. 2008;22(1):27–37.

    MathSciNet  Article  MATH  Google Scholar 

  5. 5.

    Fernández de Castro B, Guillas S, González-Manteiga W. Functional samples and bootstrap for predicting sulfur dioxide levels. Technometrics. 2005;47(2):212–22.

    MathSciNet  Article  Google Scholar 

  6. 6.

    Fernández de Castro B, Prada-Sánchez J, González-Manteiga W, Febrero-Bande M, Bermúdez Cela J, Hernández Fernández J. Prediction of SO2 levels using neural networks. J Air Waste Manage Assoc. 2003;53(5):532–9.

    Article  Google Scholar 

  7. 7.

    Friedman J, Stuetzle W. Projection pursuit regression. J Am Stat Assoc. 1981;76(376):817–23.

    MathSciNet  Article  Google Scholar 

  8. 8.

    García-Jurado I, González-Manteiga W, Prada-Sánchez J, Febrero-Bande M, Cao R. Predicting using Box–Jenkins, nonparametric, and bootstrap techniques. Technometrics. 1995;37(3):303–10.

    MathSciNet  MATH  Google Scholar 

  9. 9.

    Granger C. Co-integrated variables and error-correcting models. PhD thesis, Discussion Paper 83-13. Department of Economics, University of California at San Diego; 1983.

  10. 10.

    Hamilton JD. Time series analysis. vol. 2. Princeton: Princeton University Press; 1994.

    Google Scholar 

  11. 11.

    Johansen S. Statistical analysis of cointegration vectors. J Econ Dyn Control. 1988;12(2):231–54.

    MathSciNet  Article  MATH  Google Scholar 

  12. 12.

    Mammen E, Linton O, Nielsen J. The existence and asymptotic properties of a backfitting projection algorithm under weak conditions. Ann Stat. 1999;27(5):1443–90.

    MathSciNet  MATH  Google Scholar 

  13. 13.

    Nadaraya EA. On estimating regression. Theory Probab Appl. 1964;9(1):141–2.

    Article  MATH  Google Scholar 

  14. 14.

    Prada-Sánchez J, Febrero-Bande M. Parametric, non-parametric and mixed approaches to prediction of sparsely distributed pollution incidents: a case study. J Chemom. 1997;11(1):13–32.

    Article  Google Scholar 

  15. 15.

    Prada-Sánchez J, Febrero-Bande M, Cotos-Yáñez T, González-Manteiga W, Bermúdez-Cela J, Lucas-Domínguez T. Prediction of SO2 pollution incidents near a power station using partially linear models and an historical matrix of predictor-response vectors. Environmetrics. 2000;11(2):209–25.

    Article  Google Scholar 

  16. 16.

    Roca-Pardiñas J, Cadarso-Suárez C, González-Manteiga W. Testing for interactions in generalized additive models: application to SO2 pollution data. Stat Comput. 2005;15(4):289–99.

    MathSciNet  Article  Google Scholar 

  17. 17.

    Roca-Pardiñas J, González-Manteiga W, Febrero-Bande M, Prada-Sánchez J, Cadarso-Suárez C. Predicting binary time series of SO2 using generalized additive models with unknown link function. Environmetrics. 2004;15(7):729–42.

    Article  Google Scholar 

  18. 18.

    Speckman P. Kernel smoothing in partial linear models. J R Stat Soc, Ser B, Stat Methodol. 1988;50:413–36.

    MathSciNet  MATH  Google Scholar 

  19. 19.

    Watson GS. Smooth regression analysis. Sankhya, Ser A. 1964;26:359–72.

    MathSciNet  MATH  Google Scholar 

Download references


The work by Wenceslao González-Manteiga and Manuel Febrero-Bande was partially supported by projects MTM2013-41383-P and MTM2016-76969-P from the Spanish Ministry of Science and Innovation and European Regional Development Fund and IAP network StUDyS from Belgian Science Policy.

Availability of data and materials

Please contact authors for data requests.


Endesa Generation S.A. has signed along the last 20 years several contracts with academic departments for the development of an integral program for contamination prevention around the facility. The three authors have been involved in some of those contracts under different roles.

Author information




The three authors are equally contributors to this paper. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Manuel Febrero-Bande.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Wenceslao González-Manteiga, Manuel Febrero-Bande and María Piñeiro-Lamas contributed equally to this work.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

González-Manteiga, W., Febrero-Bande, M. & Piñeiro-Lamas, M. Semiparametric prediction models for variables related with energy production. J.Math.Industry 8, 7 (2018).

Download citation


  • Semiparametric prediction models
  • Pollution indicators
  • Cointegration