 Research
 Open Access
Semiparametric prediction models for variables related with energy production
 Wenceslao GonzálezManteiga†^{1, 2},
 Manuel FebreroBande†^{1, 2}Email authorView ORCID ID profile and
 María PiñeiroLamas†^{3}
https://doi.org/10.1186/s1336201800490
© The Author(s) 2018
 Received: 28 March 2018
 Accepted: 2 August 2018
 Published: 23 August 2018
Abstract
In this paper a review of semiparametric models developed throughout the years thanks to an extensive collaboration between the Department of Statistics and Operations Research of the University of Santiago de Compostela and a power station located in As Pontes (A Coruña, Spain) property of Endesa Generation, SA, is shown. In particular these models were used to predict the levels of sulphur dioxide in the environment of this power station with half an hour in advance. In this paper also a new multidimensional semiparametric model is considered. This model is a generalization of the previous models and takes into account the correlation structure of errors. Its behaviour is illustrated in a simulation study and with the prediction of the levels of two important pollution indicators in the environment of the power station: sulphur dioxide and nitrogen oxides.
Keywords
 Semiparametric prediction models
 Pollution indicators
 Cointegration
1 Introduction: an environmental problem
The coalfired power station in As Pontes is one of the production centers owned by Endesa Generation SA in the Iberian Peninsula. It is located in the town of As Pontes de García Rodríguez, northeast of A Coruña province.
This power station was designed and built to make use of lignite from the mine located in its vicinity. This solid fuel was characterized by its high moisture and sulphur contents and its low calorific value. Throughout the years the plant has undergone several transformation processes in their facilities with the aim of reducing emissions of sulphur dioxide (\(\mathrm{SO}_{2}\)). The power station completed its last adaptation in 2008 to consume, as primary fuel, imported subbituminous coal, characterized by its low sulphur and ash contents.
The location of the power plant close to natural sites of high ecological value, such as the Natural Park As Fragas do Eume and existing legislation, mean that it has existed since the beginning a great concern for its impact on the environment. Therefore the station has a Supplementary Control System of Air Quality that allows it to make changes in operating conditions in order to reduce emissions when the weather conditions are adverse to the spread of the emitted smoke plume, specifically containing \(\mathrm{SO}_{2}\), and there are significant episodes of impaired air quality. Spanish law, by rules and regulations, sets maximum concentrations that can be achieved for these gases in a given period of time. In particular, for this plant the only limit that might be exceeded at any time, is one that is established on the hourly mean (continuously computed) from the concentration of \(\mathrm{SO}_{2}\) in the soil, the value of 350 μg/m^{3}.
The problem is to be able to predict, using the information received continuously at sampling stations and the past information, the future values for \(\mathrm{SO}_{2}\) levels. Statistical forecast models are the key to get these predictions and suggest a course of action to the plant operators.
In recent years, new statistical models have been designed to obtain the simultaneous prediction of two pollution indicators in the environment due to the changes in the environmental legislation, in the power station itself, and the construction of a new natural gas combined cycle station in the vicinity. The fuels that are going to be used make that the main interest lies in predicting the values of the nitrogen oxides (\(\mathrm{NO}_{x}\)) which is emitted by both facilities simultaneously with the values of \(\mathrm{SO}_{2}\) which is only emitted by the power station.
All these changes have created a new problem: predicting hourly mean concentrations of sulphur dioxide and nitrogen oxides, measured in the environment of the two facilities. Faced with this new approach, the statistical forecast models are again an effective tool. Thus, a multidimensional prediction general model is designed (see Sect. 3).
2 Methods: onedimensional predictive models
2.1 Models designed to solve the environmental problem
Resulting from the collaboration over the past years between the Department of Statistics and Operations Research at the University of Santiago de Compostela and the Environment Section of the power station, the Integrated System of Statistical Prediction of the Immision (SIPEI, in Spanish) have been created employing statistical models to provide predictions for the levels of \(\mathrm{SO}_{2}\) with a half an hour horizon.
Because of this, a kind of memory called Historical Matrix was designed (PradaSánchez and FebreroBande [14]), which will be essential to the behaviour of all developed models so far. This matrix is composed of a large number of vectors based on \((X_{tl},\ldots,X_{t},X_{t+k})\): real data of bihourly \(\mathrm{SO}_{2}\) or \(\mathrm{NO}_{x}\) means, chosen so as to cover the full range of variable in question and make the role of historical memory. To ensure that cover the entire range of the variable, the matrix is divided into blocks according to the level of the response variable, \(X_{t+k}\). To update the memory, in every instant, when a new observation is received, the historical matrix is renewed in the following way: the class to which the new observation belongs is found and then the oldest datum in such class leaves the matrix and the new observation enters it. With a sample built this way, makes sure that always have updated information on the full variation range of the interest variable, and over the years this concept has been adapted to the different statistical techniques used.
2.1.1 The first semiparametric model
In the early years of development, the data transmission frequency to SIPEI was pentaminutal, and also, the legislation in force at that time established the limit values for the two hour mean of the \(\mathrm {SO}_{2}\). For this reason, the prediction models for \(\mathrm{SO}_{2}\) levels initially worked with series of bihourly means. The objective was to obtain the prediction, with a half an hour horizon, for this time series. Therefore, each time it receives a new observation, \(X_{t}\), it has to predict the value at six times ahead, \(X_{t+6}\).
In particular at each time t, the regression function \(\varphi _{6}(X_{t},X_{t1})=\mathbb{E}(X_{t+6}/X_{t},X_{t1})\) is estimated with the wellknown Nadaraya–Watson kernel type estimator (see Nadaraya [13] and Watson [19]) using the information provided by the historical matrix. The second step is to calculate the residual time series \(\hat {Z}_{t64},\ldots,\hat{Z}_{t}\) relative to the last six hours, where \(\hat{Z}_{i}=X_{i}\hat{\mathbb{E}}(X_{i}/X_{i6},X_{i7})\) for each i and fits an appropriate ARIMA model for it. Finally we get the Box–Jenkins prediction of \(\hat{Z}_{t+6}\). The final point prediction proposed is given by: \(\hat{\mathbb{E}}(X_{t+6}/X_{t},X_{t1})+\hat{Z}_{t+6}\).
2.1.2 Partially linear model
The information used by the previous semiparametric models to obtain the predictions is the past of the time series; however it might be useful to introduce additional information in order to improve these predictions. Specifically, meteorological and emission variables have been used with, the socalled partially linear models (PradaSánchez et al. [15]) to estimate bihourly mean values of \(\mathrm{SO}_{2}\) with one hour in advance.
Data in the form of \((V_{t},Z_{t},Y_{t})\) is considered, where \(V_{t}\) is a vector of exogenous variables, \(Z_{t}=(X_{t},X_{tl})\) and \(Y_{t}=X_{t+12}\) being \(X_{t}\) the series of bihourly \(\mathrm{SO}_{2}\) means; and it is assumed that this series conform to the following partially linear model: \(Y_{t}=V^{t}_{t}\beta+\varphi(Z_{t})+\epsilon_{t}\), where \(\epsilon_{t}\) is an error term of mean equals to zero.
This model can easily estimated following Speckman [18] and allow us to extend the horizon to one hour maintaining the same level of accuracy as the semiparametric model for half an hour horizon. In any case, the incorporation of external information slightly improves the prediction because the measure point for the meteorological variables is located at 80 m over ground level which is relatively far away (and so, uncorrelated) respect to the typical height of the emitted smoke plume (above 800 m over ground level). Emission information is also of little interest because these signals are almost constant specially when the facility is working not describing at all the reasons that make the smoke plume falls to the ground. By these reasons, meteorological or emission information was not considered in the following models.
2.1.3 Neural networks
The change in the interest series established by the European Council Directive 1999/30/CE, from bihourly means to hourly means, causes the time series to be less smooth. At the beginning, the previous semiparametric model was adapted to work on the new series of hourly means. The results showed a considerable increase in terms of the variability of the given predictions, regarding the results usually obtained for the series of two hour means.
In an attempt to improve the response given by the SIPEI, and in particular, its point predictions with half an hour horizon, new predictors based on neural networks models were developed (Fernández de Castro et al. [6]).
A neural network model has been designed to provide predictions of one hour mean values of \(\mathrm{SO}_{2}\) with half an hour in advance. It consists of an input layer, one hidden layer and an output layer. The number of nodes in the output layer is determined by the size of the response to be obtained from the network; in this case interested in a prediction for \(X_{t+6}\). As input to the network it has been taken the bidimensional vector \((X_{t3},X_{t})\) and the nodes in the hidden layer have been taken as the activation function of a logistic function, and in the output layer, the identity function.
The weights \(\{\omega_{j1}^{h},\omega_{j2}^{h},\omega_{1j}^{o}; j=1,\ldots ,L\}\) and the trends \(\{\theta_{j}^{h}; j=1,\ldots,L\}\) are determined during the training process, as well as the final L number of hidden layer nodes, that is chosen like the value which neural network provides better results, after having trained networks with identical architecture and different values of L. To design the training set of the neural network it have been considered historical matrices, formerly introduced, suitably adapted.
2.1.4 Functional data model
The one hour mean values of \(\mathrm{SO}_{2}\) can be treated as observations of a stochastic process in continuous time. The interest is, as it was discussed above, to predict a halfhour horizon, so that each of the curves is an interpolated data on half an hour. In this case curves were obtained by considering six pentaminutal consecutive observations, with sampling points for each functional data. Therefore, we use random variables with values in Hilbert space \(H=L^{2}([0,6])\) with the form \(X_{t}(u)=x(6t+u)\).
The following statistical model is considered \(X_{t}=\rho (X_{t1})+\epsilon_{t}\), where \(\epsilon_{t}\) is a Hilbertian strong white noise and \(\rho:H\to H\) is the operator to estimate. For the estimation of ρ, a functional kernel estimator has been used in the autoregressive Hilbertian of orderone framework. Furthermore, it has been conveniently adapted the concept of historical matrix to the case where the data are curves (Fernández de Castro et al. [5]).
2.1.5 Other approaches designed to predict probabilities
The models described, so far, provide point predictions of \(\mathrm {SO}_{2}\), but other techniques have also been developed in order to predict probabilities. The aim of these alternative models is to estimate the probability that the series of bihourly \(\mathrm{SO}_{2}\) measures exceeds a certain level r with an hour anticipation, namely in our case, we predict \(\mathbb{P} (Z_{t} )=\mathbb{P} (X_{t+12}>rZ_{t} )\) being \(Z_{t}= (X_{t},X_{t}X_{t3} )\). To do it additive models with an unknown link function (RocaPardiñas et al. [17]) have been used.
It has also been considered more complex generalized additive models (GAM) with secondorder interaction terms (RocaPardiñas et al. [16]). They have shown that the GAM with interactions detects the onset of episodes earlier than it does GAM on its own.
2.2 Alternative onedimensional models: additive models
In the statistical literature there is a wide range of onedimensional models which can be used to predict the levels of \(\mathrm{SO}_{2}\). We will focus on the techniques we will use in the next section to construct our multidimensional model: additive models for continuous response.
A generalized kernel nonparametric estimation can be given using smooth backfitting for the functions \(m_{1},\ldots,m_{q}\) (see again the above mentioned papers).
In all the models described above it is usually necessary the selection of a regularization parameter (bandwidth with kernel smoothing, number of neurons in the hidden layer for neural networks, …). The calibration of this parameter was developed using crossvalidation techniques with the information of the updated Historical Matrix.
3 Methods: multidimensional semiparametric prediction
The new goal is to incorporate the prediction of \(\mathrm{NO}_{x}\) with half an hour in advance, as well as to continue getting the predictions of \(\mathrm{SO}_{2}\), as has already been commented. The idea is to generalize the onedimensional semiparametric approach proposed by GarcíaJurado et al. [8] taking into account the structure of correlation between the vectorial series that is intended to predict.
3.1 The model
 P1.:

Each \(\mathcal{E}_{k,t}\) is a stationary AR(\(p_{k}\)) process of the formindependent of \(Z_{t}\), where \(\xi_{k,t}\) is a white noise process with variance \(\sigma_{k}^{2}\), for \(k=1,\ldots,r\).$$\mathcal{E}_{k,t}=\sum_{i=1}^{p_{k}}{ \phi_{k}^{i}\mathcal {E}_{k,ti}+\xi_{k,t}} \quad \mbox{for all } t\in\mathbb{Z}, k=1,\ldots,r $$
 P2.:

\(\mathcal{E}_{t}\) has a VAR(p) structure of the formindependent of \(Z_{t}\), where the \(\Phi_{i}\) are fixed (\(r\times r\)) coefficients matrices and \(\xi_{t}\) is a rdimensional white noise process, i.e. \(\mathbb{E}(\xi_{t})=0\), \(\mathbb{E}(\xi_{t} \xi '_{t})=\Sigma_{\xi}\) and \(\mathbb{E}(\xi_{t} \xi'_{s})=0\) for \(t\neq s\).$$\mathcal{E}_{t}=\sum_{i=1}^{p}{ \Phi_{i}\mathcal{E}_{ti}+\xi_{t}} \quad \mbox{for all } t\in\mathbb{Z}, $$
3.2 Estimations
 P1.:

The parameters \(\phi_{k}=(\phi_{k}^{1},\ldots,\phi_{k}^{p_{k}})\) of the error process \(\{\mathcal{E}_{k,t} \}\) are estimated by standard maximum likelihood methods. In particular, we use a conditional maximum likelihood estimator for every component of the formwhere Φ is a compact parameter space and l̂ is the conditional loglikelihood given by$$\hat{\phi}_{k}=\operatorname{arg} \max_{\phi_{k}\in\Phi} \hat{l}(\phi_{k}), $$with \(\hat{\mathcal{E}}_{k,t}(\phi_{k})=\sum_{i=1}^{p_{k}}{\phi _{k}^{i}\hat{\mathcal{E}}_{k,ti}}\).$$\hat{l} \bigl(\phi_{k},\sigma_{k}^{2} \bigr)= \frac{T}{2}\log(2\pi)+\frac {1}{2}\log \bigl(\sigma_{k}^{2} \bigr)\frac{1}{2}\sum_{t=p_{k}+1}^{T} \bigl( \bigl(\hat{\mathcal{E}}_{k,t}\hat{\mathcal {E}}_{k,t}( \phi_{k}) \bigr)/\sigma_{k} \bigr)^{2} $$
 P2.:

The coefficients matrices \((\Phi_{1},\ldots,\Phi_{p})\) of the rdimensional error process \(\{\mathcal{E}_{t} \}\) are also estimated by generalized maximum likelihood methods (Hamilton [10]). First, we need to establish the following notation: \(\Phi^{t} = [ \Phi_{1}\:\Phi_{2}\: \ldots\: \Phi_{p} ]\) denote the \((r\times rp)\) coefficients matrix, let \(X_{t}\) be a \((rp\times1)\) vector containing p lags of each of the elements of \(\mathcal{E}_{t}\): \(X_{t}^{t}=[\mathcal {E}^{t}_{t1}\:\mathcal{E}^{t}_{t2}\:\ldots\:\mathcal{E}^{t}_{tp}]\).
The theoretical conditional loglikelihood function to be optimized has the following expression:$$l(\Phi, \Sigma_{\xi})=\frac{rT}{2}\log(2\pi)+\frac{r}{2} \log \bigl\vert \Sigma_{\xi}^{1} \bigr\vert  \frac{1}{2}\sum_{t=1}^{T}{ \bigl[ \bigl(\mathcal{E}_{t}\Phi^{t}X_{t} \bigr)^{t}\:\Sigma_{\xi}^{1} \bigl( \mathcal{E}_{t}\Phi^{t}X_{t} \bigr) \bigr]}. $$Thus the conditional loglikelihood is:$$\hat{l}(\hat{\Phi}, \hat{\Sigma}_{\xi})=\frac{rT}{2}\log(2\pi )+ \frac{r}{2}\log \bigl\vert \hat{\Sigma}_{\xi}^{1} \bigr\vert \frac {1}{2}\sum_{t=1}^{T}{ \bigl[ \bigl(\hat{\mathcal{E}}_{t}\hat{\Phi}^{t}\hat {X}_{t} \bigr)^{t} \: \hat{\Sigma}_{\xi}^{1} \bigl(\hat{\mathcal {E}}_{t}\hat{\Phi}^{t} \hat{X}_{t} \bigr) \bigr]}. $$
3.3 Other considerations: the phenomenon of cointegration
Sometimes the vectorial processes can be cointegrated, so one has to take into account the structure of correlation between the series. The notion of cointegration has been one of the most important concepts in time series since Granger [9] and Engle and Granger [3] that formally developed it. The issue has broad applications in the analysis of economic data as well as several publications in the economic literature.
The vector β is called the cointegration vector. This vector is not unique since for any scalar c the linear combination \(c\beta^{t}Y_{t}=\beta^{*t}Y_{t}\sim I(0)\). Therefore, normalization is often assumed to identify an unique β. A typical normalization is \(\beta=(1,\beta_{2},\ldots,\beta_{r})^{t}\).
3.4 Prediction scheme
 1.
Every instant t, \(\varphi_{\kappa}(Z_{t})\) is estimated with the smooth backfitting technique independently for each of r components using the data \((Y_{l},Z_{l\kappa})\), \(l=\kappa+1,\ldots,T\).
 2.The residuals series \(\hat{\mathcal{E}}_{t+\kappa}\) is computed by$$\hat{\mathcal{E}}_{t+\kappa}=Y_{t+\kappa}\hat{\varphi}_{\kappa }(Z_{t}), \quad t=1,\ldots,T\kappa. $$
 3.
The following step is to make an appropriate adjustment on the model error structure (VECM) and to obtain the prediction κ instants ahead: \(\dot{\mathcal{E}}_{T+\kappa}\).
 4.
The proposed final prediction is given by (4).
4 Results and discussion
4.1 A simulation study
To analyze the behavior of the proposed prediction procedure, a simulation study has been performed generating samples from artificial series and making a prediction study to k lags using, in all cases, \(Z_{t}=Y_{t1}\).
 Series 1. :

Two independent AR(3) with constant trend:being$$Y_{t}=\varphi+ \begin{pmatrix} \mathcal{E}_{1,t}\\ \mathcal{E}_{2,t} \end{pmatrix}, $$where \(\eta_{1,t}\sim N (0,0.25^{2} )\), \(\eta_{2,t}\sim N (0,0.10^{2} )\) and \(\varphi=(25,10)^{t}\).$$\begin{aligned} &\mathcal{E}_{1,t}=0.50\mathcal{E}_{1,t1}0.525\mathcal {E}_{1,t2}+0.75\mathcal{E}_{1,t3}+\eta_{1,t}, \\ &\mathcal{E}_{2,t}=0.1875\mathcal{E}_{2,t1}0.50\mathcal {E}_{2,t2}+0.05\mathcal{E}_{2,t3}+\eta_{2,t}, \end{aligned}$$
 Series 2. :

VAR(3) with constant trend:being ${\mathrm{\Pi}}_{1}=\left(\begin{array}{cc}0.50& 0.3150\\ 0.75& 0.1875\end{array}\right)$, ${\mathrm{\Pi}}_{2}=\left(\begin{array}{cc}0.525& 0\\ 0& 0.50\end{array}\right)$, ${\mathrm{\Pi}}_{3}=\left(\begin{array}{cc}0.75& 0.375\\ 0.50& 0.050\end{array}\right)$, ${\eta}_{t}\sim {N}_{2}(\left(\begin{array}{c}0\\ 0\end{array}\right),\left(\begin{array}{cc}{0.25}^{2}& 0\\ 0& {0.10}^{2}\end{array}\right))$ and φ as in Series 1.$$Y_{t}=\varphi+\Pi_{1}(Y_{t1}\varphi)+ \Pi_{2}(Y_{t2}\varphi)+\Pi _{3}(Y_{t3} \varphi)+\eta_{t}, $$
 Series 3. :

NPVAR(1) with independent VAR(3) noise:being $\phi (y)=\left(\begin{array}{c}{\phi}_{1}(y)\\ {\phi}_{2}(y)\end{array}\right)=\left(\begin{array}{c}5cos({y}_{1})\\ 5cos({y}_{2})\end{array}\right)$ and \(\mathcal{E}_{t}=\Pi_{1}\mathcal{E}_{t1}+\Pi_{2}\mathcal {E}_{t2}+\Pi_{3}\mathcal{E}_{t3}+\eta_{t}\), where \(\Pi_{1}\), \(\Pi_{2}\), \(\Pi_{3}\) and \(\eta_{t}\) are similar to those of the previous series.$$Y_{t}=\varphi (Y_{t1} )+\mathcal{E}_{t}, $$
 Series 4. :

VECM with constant trend:being \(Y_{1,t}=Y_{1,t1}+v_{t}\) and \(Y_{2,t}=Y_{1,t}+u_{t}\), where \(v_{t}\sim N (0,0.5^{2} )\), \(u_{t}=0.75 u_{t1}+\eta_{t}\), \(\eta _{t}\sim N (0,0.5^{2} )\) and φ is similar to that of the first series.$$Y_{t}=\varphi+ \begin{pmatrix} Y_{1,t}\\ Y_{2,t} \end{pmatrix}, $$
In each case, 1000 bidimensional series of length 500 were generated from the models given above (\(Y_{1}^{i},\ldots,Y_{500}^{i}\) with \(1\leq i\leq1000\)). These values correspond to the generation after an initial period of stabilization (starting at zero and neglecting the first 500 values drawn). For every sample, \(M=500\) possible continuations of the series were obtained for k periods ahead \((Y_{500+k}^{i1},\ldots,Y_{500+k}^{i500})\), which were compared with the prediction that was made from the sample \(Y_{1}^{i},\ldots,Y_{500}^{i}\).
 (a)
The nonparametric predictor using additive models with the estimation of each component independently (NPM).
 (b)
The semiparametric predictor for additive models to estimate the trend of each bidimensional series component independently with model P1 for the residuals (SPM).
 (c)
The semiparametric predictor for additive models to estimate the trend of each bidimensional series component independently with VAR modelling for the vector residuals proposed in the previous section as P2 (SPBM).
AR independent model (Series 1)
Series 1  MSPE(a)  MSPE(b)  MSPE(c)  

Lags  Var. 1  Var. 2  Var. 1  Var. 2  Var. 1  Var. 2 
1  0.1637  0.0134  0.0646  0.0103  0.0640  0.0102 
2  0.1704  0.0135  0.0819  0.0107  0.0804  0.0105 
3  0.1735  0.0134  0.0856  0.0129  0.0845  0.0127 
10  0.1701  0.0134  0.1595  0.0137  0.1480  0.0135 
20  0.1690  0.0134  0.2140  0.0141  0.1695  0.0135 
30  0.1689  0.0134  0.2696  0.0147  0.1733  0.0135 
4.2 Real data application
The general model proposed in Sect. 3.1 was implemented for the particular case of the prediction of levels of \(\mathrm{SO}_{2}\) and \(\mathrm{NO}_{x}\) in the vicinity of power station and combined cycle.

Every instant t, \(\varphi (Z_{t} )\) is estimated with additive models and the information provided by the historical matrix, independently for each component. The estimate of φ is done at 30 instants ahead: \(\dot{Y}_{t}=\dot{X}_{t+30}=\hat{\varphi }_{30} (Z_{t} )+\dot{e}_{t+30}\).

The residuals series \(\hat{e}_{t}\) is computed by \(\hat {e}_{t}=Y_{t}\hat{\varphi}_{30} (Z_{t} )\) and a test of model adequacy is performed (for instance, the Ljung–Box test) for each component of the series concerning the last four hours (240 observations).

If any of the components of the residuals series is not white noise, a test is performed to explore if the vectorial residual series is cointegrated. If this is the case, an adequate VECM is adjusted. If the series is not cointegrated, a VAR model is fitted.

Thus \(\dot{e}_{t+30}\) is obtained.

The proposed final prediction given by the Semiparametric Bidimensional Model with the nonparametric part estimated at 30 instants (SPBM) is:$$\dot{X}_{t+30}=\hat{\varphi}_{30} (Z_{t} )+ \dot{e}_{t+30}. $$
To observe the behaviour of the prediction model, we have evaluated its performance on two episodes of air quality alteration, whose information has not been included in the historical matrix.
 (a)
Squared error: \(\textrm{SE}=\sum_{t} (y_{t}\hat {y}_{t} )^{2}\).
 (b)
Absolute error: \(\textrm{AE}= \vert y_{t}\hat{y}_{t} \vert \).
 (c)
Relative absolute error (%): \(\textrm{RAE}=100{ \vert \frac{y_{t}\hat{y}_{t}}{y_{t}} \vert }\).
5 Conclusions
This paper reviews several prediction models that have been implemented along the years for the prediction of \(\mathrm{SO}_{2}\) in the vicinity of a power station. This evolution reflects the adaptation of the statistical models to the change of improved environmental rules and the availability of new technological resources that allows the estimation in more complex situations.
VAR(3) model (Series 2)
Series 2  MSPE(a)  MSPE(b)  MSPE(c)  

Lags  Var. 1  Var. 2  Var. 1  Var. 2  Var. 1  Var. 2 
1  0.1689  0.2299  0.0709  0.0513  0.0644  0.0103 
2  0.1965  0.2782  0.0950  0.0773  0.0842  0.0494 
3  0.1963  0.2650  0.0979  0.1295  0.0852  0.0673 
10  0.2066  0.2803  0.1898  0.2455  0.1794  0.2313 
20  0.2101  0.2824  0.2060  0.2856  0.2058  0.2732 
30  0.2114  0.2828  0.2103  0.2875  0.2106  0.2813 
Model NPAR(1) with VAR(3) noise (Series 3)
Series 3  MSPE(a)  MSPE(b)  MSPE(c)  

Lags  Var. 1  Var. 2  Var. 1  Var. 2  Var. 1  Var. 2 
1  1.3530  1.0066  1.2277  0.9937  1.2281  0.9895 
2  6.4228  6.8469  6.3825  6.8125  6.3962  6.8373 
3  12.4008  14.9944  12.5475  15.0671  12.5544  15.0604 
10  19.9809  19.8974  20.0418  20.0296  20.0456  20.0320 
20  20.5862  20.0169  20.6387  20.1749  20.6455  20.1776 
30  20.8909  19.6885  20.9619  19.8530  20.9679  19.8559 
Pure VECM model (Series 4)
Series 4  MSPE(a)  MSPE(b)  MSPE(c)  

Lags  Var. 1  Var. 2  Var. 1  Var. 2  Var. 1  Var. 2 
1  6.9370  8.8101  0.3426  3.0539  0.2699  2.6542 
2  16.8852  18.7127  2.9435  5.8296  2.8263  5.1088 
3  25.5642  27.4790  7.1160  10.2645  6.9875  9.3126 
10  47.1114  49.5927  23.1711  28.1485  21.3344  24.0975 
20  51.5780  54.1854  33.4512  41.7827  25.4007  28.2645 
30  54.3195  56.9187  46.7582  59.7599  28.0551  30.8976 
\(\mathrm{SO}_{2}\) and \(\mathrm{NO}_{x}\) Forecast errors
Model  \(\mathrm{SO}_{2}\)  \(\mathrm{NO}_{x}\)  

SE  AE  RAE  SE  AE  RAE  
M  Md  M  Md  M  Md  M  Md  M  Md  M  Md  
SPBM  1265.28  635.65  27.91  25.21  27.15  18.33  15.83  8.87  3.25  2.97  21.35  9.28 
NPM  1043.23  372.02  24.20  19.29  24.18  15.99  30.35  18.28  4.42  4.28  29.77  12.17 
\(\mathrm{SO}_{2}\) Forecast errors
Model  \(\mathrm{SO}_{2}\)  

SE  AE  RAE  
M  Md  M  Md  M  Md  
SPBM  5782.18  2566.27  62.40  50.66  38.48  16.31 
NPM  5833.85  3055.29  64.17  55.27  43.92  16.39 
Notes
Declarations
Acknowledgements
The work by Wenceslao GonzálezManteiga and Manuel FebreroBande was partially supported by projects MTM201341383P and MTM201676969P from the Spanish Ministry of Science and Innovation and European Regional Development Fund and IAP network StUDyS from Belgian Science Policy.
Availability of data and materials
Please contact authors for data requests.
Funding
Endesa Generation S.A. has signed along the last 20 years several contracts with academic departments for the development of an integral program for contamination prevention around the facility. The three authors have been involved in some of those contracts under different roles.
Authors’ contributions
The three authors are equally contributors to this paper. All authors read and approved the final manuscript.
Competing interests
The authors declare that they have no competing interests.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Authors’ Affiliations
References
 Box G, Jenkins M, Reinsel C. Time series analysis: forecasting and control. New York: Wiley; 2008. View ArticleMATHGoogle Scholar
 Buja A, Hastie T, Tibshirani R. Linear smoothers and additive models. Ann Stat. 1989;17:453–510. MathSciNetView ArticleMATHGoogle Scholar
 Engle RF, Granger CWJ. Cointegration and error correction: representation, estimation and testing. Econometrica. 1987;57:251–76. MathSciNetView ArticleMATHGoogle Scholar
 Fernández de Castro B, GonzálezManteiga W. Boosting for real and functional samples: an application to an environmental problem. Stoch Environ Res Risk Assess. 2008;22(1):27–37. MathSciNetView ArticleMATHGoogle Scholar
 Fernández de Castro B, Guillas S, GonzálezManteiga W. Functional samples and bootstrap for predicting sulfur dioxide levels. Technometrics. 2005;47(2):212–22. MathSciNetView ArticleGoogle Scholar
 Fernández de Castro B, PradaSánchez J, GonzálezManteiga W, FebreroBande M, Bermúdez Cela J, Hernández Fernández J. Prediction of SO2 levels using neural networks. J Air Waste Manage Assoc. 2003;53(5):532–9. View ArticleGoogle Scholar
 Friedman J, Stuetzle W. Projection pursuit regression. J Am Stat Assoc. 1981;76(376):817–23. MathSciNetView ArticleGoogle Scholar
 GarcíaJurado I, GonzálezManteiga W, PradaSánchez J, FebreroBande M, Cao R. Predicting using Box–Jenkins, nonparametric, and bootstrap techniques. Technometrics. 1995;37(3):303–10. MathSciNetMATHGoogle Scholar
 Granger C. Cointegrated variables and errorcorrecting models. PhD thesis, Discussion Paper 8313. Department of Economics, University of California at San Diego; 1983. Google Scholar
 Hamilton JD. Time series analysis. vol. 2. Princeton: Princeton University Press; 1994. MATHGoogle Scholar
 Johansen S. Statistical analysis of cointegration vectors. J Econ Dyn Control. 1988;12(2):231–54. MathSciNetView ArticleMATHGoogle Scholar
 Mammen E, Linton O, Nielsen J. The existence and asymptotic properties of a backfitting projection algorithm under weak conditions. Ann Stat. 1999;27(5):1443–90. MathSciNetMATHGoogle Scholar
 Nadaraya EA. On estimating regression. Theory Probab Appl. 1964;9(1):141–2. View ArticleMATHGoogle Scholar
 PradaSánchez J, FebreroBande M. Parametric, nonparametric and mixed approaches to prediction of sparsely distributed pollution incidents: a case study. J Chemom. 1997;11(1):13–32. View ArticleGoogle Scholar
 PradaSánchez J, FebreroBande M, CotosYáñez T, GonzálezManteiga W, BermúdezCela J, LucasDomínguez T. Prediction of SO2 pollution incidents near a power station using partially linear models and an historical matrix of predictorresponse vectors. Environmetrics. 2000;11(2):209–25. View ArticleGoogle Scholar
 RocaPardiñas J, CadarsoSuárez C, GonzálezManteiga W. Testing for interactions in generalized additive models: application to SO2 pollution data. Stat Comput. 2005;15(4):289–99. MathSciNetView ArticleGoogle Scholar
 RocaPardiñas J, GonzálezManteiga W, FebreroBande M, PradaSánchez J, CadarsoSuárez C. Predicting binary time series of SO2 using generalized additive models with unknown link function. Environmetrics. 2004;15(7):729–42. View ArticleGoogle Scholar
 Speckman P. Kernel smoothing in partial linear models. J R Stat Soc, Ser B, Stat Methodol. 1988;50:413–36. MathSciNetMATHGoogle Scholar
 Watson GS. Smooth regression analysis. Sankhya, Ser A. 1964;26:359–72. MathSciNetMATHGoogle Scholar