 Research
 Open Access
 Published:
Semiparametric prediction models for variables related with energy production
Journal of Mathematics in Industryvolume 8, Article number: 7 (2018)
Abstract
In this paper a review of semiparametric models developed throughout the years thanks to an extensive collaboration between the Department of Statistics and Operations Research of the University of Santiago de Compostela and a power station located in As Pontes (A Coruña, Spain) property of Endesa Generation, SA, is shown. In particular these models were used to predict the levels of sulphur dioxide in the environment of this power station with half an hour in advance. In this paper also a new multidimensional semiparametric model is considered. This model is a generalization of the previous models and takes into account the correlation structure of errors. Its behaviour is illustrated in a simulation study and with the prediction of the levels of two important pollution indicators in the environment of the power station: sulphur dioxide and nitrogen oxides.
Introduction: an environmental problem
The coalfired power station in As Pontes is one of the production centers owned by Endesa Generation SA in the Iberian Peninsula. It is located in the town of As Pontes de García Rodríguez, northeast of A Coruña province.
This power station was designed and built to make use of lignite from the mine located in its vicinity. This solid fuel was characterized by its high moisture and sulphur contents and its low calorific value. Throughout the years the plant has undergone several transformation processes in their facilities with the aim of reducing emissions of sulphur dioxide (\(\mathrm{SO}_{2}\)). The power station completed its last adaptation in 2008 to consume, as primary fuel, imported subbituminous coal, characterized by its low sulphur and ash contents.
The location of the power plant close to natural sites of high ecological value, such as the Natural Park As Fragas do Eume and existing legislation, mean that it has existed since the beginning a great concern for its impact on the environment. Therefore the station has a Supplementary Control System of Air Quality that allows it to make changes in operating conditions in order to reduce emissions when the weather conditions are adverse to the spread of the emitted smoke plume, specifically containing \(\mathrm{SO}_{2}\), and there are significant episodes of impaired air quality. Spanish law, by rules and regulations, sets maximum concentrations that can be achieved for these gases in a given period of time. In particular, for this plant the only limit that might be exceeded at any time, is one that is established on the hourly mean (continuously computed) from the concentration of \(\mathrm{SO}_{2}\) in the soil, the value of 350 μg/m^{3}.
The problem is to be able to predict, using the information received continuously at sampling stations and the past information, the future values for \(\mathrm{SO}_{2}\) levels. Statistical forecast models are the key to get these predictions and suggest a course of action to the plant operators.
In recent years, new statistical models have been designed to obtain the simultaneous prediction of two pollution indicators in the environment due to the changes in the environmental legislation, in the power station itself, and the construction of a new natural gas combined cycle station in the vicinity. The fuels that are going to be used make that the main interest lies in predicting the values of the nitrogen oxides (\(\mathrm{NO}_{x}\)) which is emitted by both facilities simultaneously with the values of \(\mathrm{SO}_{2}\) which is only emitted by the power station.
All these changes have created a new problem: predicting hourly mean concentrations of sulphur dioxide and nitrogen oxides, measured in the environment of the two facilities. Faced with this new approach, the statistical forecast models are again an effective tool. Thus, a multidimensional prediction general model is designed (see Sect. 3).
Methods: onedimensional predictive models
Models designed to solve the environmental problem
Resulting from the collaboration over the past years between the Department of Statistics and Operations Research at the University of Santiago de Compostela and the Environment Section of the power station, the Integrated System of Statistical Prediction of the Immision (SIPEI, in Spanish) have been created employing statistical models to provide predictions for the levels of \(\mathrm{SO}_{2}\) with a half an hour horizon.
Due to data availability with minutal frequency in realtime and current legislation, the hourly mean is considered from both of the values of \(\mathrm{SO}_{2}\) and \(\mathrm{NO}_{x}\), for predictions of future values of both pollutants. Thus, two time series are constructed, \(X_{1,t}\) and \(X_{2,t}\), for which the subscript t represents a minutal instant, and each value will be an average of the actual values for the last hour:
where \(\mathrm{SO}_{2}(t)\) and \(\mathrm{NO}_{x}(t)\) represent the concentration of \(\mathrm{SO}_{2}\) and \(\mathrm{NO}_{x}\), respectively, at time t, measured in \(\mu g/m^{3}\).
The series of hourly \(\mathrm{SO}_{2}\) means has a characteristic behaviour, highly influenced by weather conditions and local topography. It takes values close to zero for long periods of time, and it can suddenly and sharply increase (episodes) in bad meteorological conditions for the dispersion of the smoke plume. Nowadays, the series of hourly \(\mathrm{NO}_{x}\) means has a similar behaviour to that of \(\mathrm{SO}_{2}\), but on a smaller scale (see Fig. 1). The main objective of the developed statistical models is to predict the episodes, so our interest is centred on the values that occur less frequently along the time series.
Because of this, a kind of memory called Historical Matrix was designed (PradaSánchez and FebreroBande [14]), which will be essential to the behaviour of all developed models so far. This matrix is composed of a large number of vectors based on \((X_{tl},\ldots,X_{t},X_{t+k})\): real data of bihourly \(\mathrm{SO}_{2}\) or \(\mathrm{NO}_{x}\) means, chosen so as to cover the full range of variable in question and make the role of historical memory. To ensure that cover the entire range of the variable, the matrix is divided into blocks according to the level of the response variable, \(X_{t+k}\). To update the memory, in every instant, when a new observation is received, the historical matrix is renewed in the following way: the class to which the new observation belongs is found and then the oldest datum in such class leaves the matrix and the new observation enters it. With a sample built this way, makes sure that always have updated information on the full variation range of the interest variable, and over the years this concept has been adapted to the different statistical techniques used.
The first semiparametric model
In the early years of development, the data transmission frequency to SIPEI was pentaminutal, and also, the legislation in force at that time established the limit values for the two hour mean of the \(\mathrm {SO}_{2}\). For this reason, the prediction models for \(\mathrm{SO}_{2}\) levels initially worked with series of bihourly means. The objective was to obtain the prediction, with a half an hour horizon, for this time series. Therefore, each time it receives a new observation, \(X_{t}\), it has to predict the value at six times ahead, \(X_{t+6}\).
A semiparametric approach was considered (GarcíaJurado et al. [8]) which generalizes the traditional Box–Jenkins models as follows:
where \(Z_{t}\) has an ARIMA structure of mean zero independent of \(X_{t}\) (Box et al. [1]).
In particular at each time t, the regression function \(\varphi _{6}(X_{t},X_{t1})=\mathbb{E}(X_{t+6}/X_{t},X_{t1})\) is estimated with the wellknown Nadaraya–Watson kernel type estimator (see Nadaraya [13] and Watson [19]) using the information provided by the historical matrix. The second step is to calculate the residual time series \(\hat {Z}_{t64},\ldots,\hat{Z}_{t}\) relative to the last six hours, where \(\hat{Z}_{i}=X_{i}\hat{\mathbb{E}}(X_{i}/X_{i6},X_{i7})\) for each i and fits an appropriate ARIMA model for it. Finally we get the Box–Jenkins prediction of \(\hat{Z}_{t+6}\). The final point prediction proposed is given by: \(\hat{\mathbb{E}}(X_{t+6}/X_{t},X_{t1})+\hat{Z}_{t+6}\).
Partially linear model
The information used by the previous semiparametric models to obtain the predictions is the past of the time series; however it might be useful to introduce additional information in order to improve these predictions. Specifically, meteorological and emission variables have been used with, the socalled partially linear models (PradaSánchez et al. [15]) to estimate bihourly mean values of \(\mathrm{SO}_{2}\) with one hour in advance.
Data in the form of \((V_{t},Z_{t},Y_{t})\) is considered, where \(V_{t}\) is a vector of exogenous variables, \(Z_{t}=(X_{t},X_{tl})\) and \(Y_{t}=X_{t+12}\) being \(X_{t}\) the series of bihourly \(\mathrm{SO}_{2}\) means; and it is assumed that this series conform to the following partially linear model: \(Y_{t}=V^{t}_{t}\beta+\varphi(Z_{t})+\epsilon_{t}\), where \(\epsilon_{t}\) is an error term of mean equals to zero.
This model can easily estimated following Speckman [18] and allow us to extend the horizon to one hour maintaining the same level of accuracy as the semiparametric model for half an hour horizon. In any case, the incorporation of external information slightly improves the prediction because the measure point for the meteorological variables is located at 80 m over ground level which is relatively far away (and so, uncorrelated) respect to the typical height of the emitted smoke plume (above 800 m over ground level). Emission information is also of little interest because these signals are almost constant specially when the facility is working not describing at all the reasons that make the smoke plume falls to the ground. By these reasons, meteorological or emission information was not considered in the following models.
Neural networks
The change in the interest series established by the European Council Directive 1999/30/CE, from bihourly means to hourly means, causes the time series to be less smooth. At the beginning, the previous semiparametric model was adapted to work on the new series of hourly means. The results showed a considerable increase in terms of the variability of the given predictions, regarding the results usually obtained for the series of two hour means.
In an attempt to improve the response given by the SIPEI, and in particular, its point predictions with half an hour horizon, new predictors based on neural networks models were developed (Fernández de Castro et al. [6]).
A neural network model has been designed to provide predictions of one hour mean values of \(\mathrm{SO}_{2}\) with half an hour in advance. It consists of an input layer, one hidden layer and an output layer. The number of nodes in the output layer is determined by the size of the response to be obtained from the network; in this case interested in a prediction for \(X_{t+6}\). As input to the network it has been taken the bidimensional vector \((X_{t3},X_{t})\) and the nodes in the hidden layer have been taken as the activation function of a logistic function, and in the output layer, the identity function.
The predictor given by the neural network has the following expression:
with \(f_{j}^{h}(z)=\frac{1}{1+e^{z}}\).
The weights \(\{\omega_{j1}^{h},\omega_{j2}^{h},\omega_{1j}^{o}; j=1,\ldots ,L\}\) and the trends \(\{\theta_{j}^{h}; j=1,\ldots,L\}\) are determined during the training process, as well as the final L number of hidden layer nodes, that is chosen like the value which neural network provides better results, after having trained networks with identical architecture and different values of L. To design the training set of the neural network it have been considered historical matrices, formerly introduced, suitably adapted.
Figure 2 shows the forecasts given half an hour before by the neural network with 50 nodes in its hidden layer for an episode depicted in one of the measuring stations. The good behaviour of the forecast (dotted line) can easily be seen. The procedures based on neural networks accurately predict the real one hour mean \(\mathrm {SO}_{2}\) air quality values (solid line). These models were optimized later with boosting learning techniques (Fernández de Castro and GonzálezManteiga [4]).
Functional data model
The one hour mean values of \(\mathrm{SO}_{2}\) can be treated as observations of a stochastic process in continuous time. The interest is, as it was discussed above, to predict a halfhour horizon, so that each of the curves is an interpolated data on half an hour. In this case curves were obtained by considering six pentaminutal consecutive observations, with sampling points for each functional data. Therefore, we use random variables with values in Hilbert space \(H=L^{2}([0,6])\) with the form \(X_{t}(u)=x(6t+u)\).
The following statistical model is considered \(X_{t}=\rho (X_{t1})+\epsilon_{t}\), where \(\epsilon_{t}\) is a Hilbertian strong white noise and \(\rho:H\to H\) is the operator to estimate. For the estimation of ρ, a functional kernel estimator has been used in the autoregressive Hilbertian of orderone framework. Furthermore, it has been conveniently adapted the concept of historical matrix to the case where the data are curves (Fernández de Castro et al. [5]).
Other approaches designed to predict probabilities
The models described, so far, provide point predictions of \(\mathrm {SO}_{2}\), but other techniques have also been developed in order to predict probabilities. The aim of these alternative models is to estimate the probability that the series of bihourly \(\mathrm{SO}_{2}\) measures exceeds a certain level r with an hour anticipation, namely in our case, we predict \(\mathbb{P} (Z_{t} )=\mathbb{P} (X_{t+12}>rZ_{t} )\) being \(Z_{t}= (X_{t},X_{t}X_{t3} )\). To do it additive models with an unknown link function (RocaPardiñas et al. [17]) have been used.
It has also been considered more complex generalized additive models (GAM) with secondorder interaction terms (RocaPardiñas et al. [16]). They have shown that the GAM with interactions detects the onset of episodes earlier than it does GAM on its own.
Alternative onedimensional models: additive models
In the statistical literature there is a wide range of onedimensional models which can be used to predict the levels of \(\mathrm{SO}_{2}\). We will focus on the techniques we will use in the next section to construct our multidimensional model: additive models for continuous response.
There have been a number of proposals for fitting the additive models. Friedman and Stuetzle [7] introduced a backfitting algorithm and Buja et al. [2] studied its properties. Mammen et al. [12] proposed the so called smooth backfitting by employing projection arguments. Let \(\{ (Y_{t},Z_{t}) \}_{t=1}^{T}\) be a random sample of a strictly stationary time series, with \(Y_{t}\) onedimensional and \(Z_{t}\) qdimensional following the model:
where \(\{\epsilon_{t} \}\) is a white noise process and \(\mathbb{E}[\epsilon_{t}Z_{t}]=0\).
Typically, it is assumed that the function m is additive with component functions \(m_{j}\), for \(j=0,\ldots,q\), thus
A generalized kernel nonparametric estimation can be given using smooth backfitting for the functions \(m_{1},\ldots,m_{q}\) (see again the above mentioned papers).
In all the models described above it is usually necessary the selection of a regularization parameter (bandwidth with kernel smoothing, number of neurons in the hidden layer for neural networks, …). The calibration of this parameter was developed using crossvalidation techniques with the information of the updated Historical Matrix.
Methods: multidimensional semiparametric prediction
The new goal is to incorporate the prediction of \(\mathrm{NO}_{x}\) with half an hour in advance, as well as to continue getting the predictions of \(\mathrm{SO}_{2}\), as has already been commented. The idea is to generalize the onedimensional semiparametric approach proposed by GarcíaJurado et al. [8] taking into account the structure of correlation between the vectorial series that is intended to predict.
The model
Be \((Y, Z )= (Y_{l}, Z_{l} )\), \(l=0,\pm1,\pm 2,\ldots\) a vectorial strictly stationary time series, where \(Y_{l}\) is a rdimensional response series and \(Z_{l}\) is a qdimensional covariables series and, let \(\{(Y_{t},Z_{t}) \}_{t=1}^{T}\) be a random sample of \((Y,Z )\). The following model is considered
where \(Y_{t}= (Y_{1,t},\ldots,Y_{r,t} )^{t}\), \(Z_{t}= (Z_{1,t},\ldots,Z_{q,t} )^{t}\) and \(\mathcal{E}_{t}= (\mathcal {E}_{1,t},\ldots,\mathcal{E}_{r,t} )^{t}\). Let us consider two possible structures for the multidimensional residuals series:
 P1.:

Each \(\mathcal{E}_{k,t}\) is a stationary AR(\(p_{k}\)) process of the form
$$\mathcal{E}_{k,t}=\sum_{i=1}^{p_{k}}{ \phi_{k}^{i}\mathcal {E}_{k,ti}+\xi_{k,t}} \quad \mbox{for all } t\in\mathbb{Z}, k=1,\ldots,r $$independent of \(Z_{t}\), where \(\xi_{k,t}\) is a white noise process with variance \(\sigma_{k}^{2}\), for \(k=1,\ldots,r\).
 P2.:

\(\mathcal{E}_{t}\) has a VAR(p) structure of the form
$$\mathcal{E}_{t}=\sum_{i=1}^{p}{ \Phi_{i}\mathcal{E}_{ti}+\xi_{t}} \quad \mbox{for all } t\in\mathbb{Z}, $$independent of \(Z_{t}\), where the \(\Phi_{i}\) are fixed (\(r\times r\)) coefficients matrices and \(\xi_{t}\) is a rdimensional white noise process, i.e. \(\mathbb{E}(\xi_{t})=0\), \(\mathbb{E}(\xi_{t} \xi '_{t})=\Sigma_{\xi}\) and \(\mathbb{E}(\xi_{t} \xi'_{s})=0\) for \(t\neq s\).
Our main objective is to predict \(Y_{t}\) using a sample of size T, κ instants ahead. The prediction of \(Y_{t+\kappa}\) is then defined by
where \(\hat{\varphi}_{\kappa}(Z_{t})\) is a nonparametric estimate of \(\varphi_{\kappa}(Z_{t})=\mathbb{E} [Y_{t+\kappa}/Z_{t} ]\) and \(\dot{\mathcal{E}}_{t+\kappa}\) the prediction given, κ instants ahead, for the residual series constructed as \(\hat{\mathcal {E}}_{t+\kappa}=Y_{t+\kappa}\hat{\varphi}_{\kappa}(Z_{t})\).
Estimations
We suppose that the model (3) is verified. The first step is to make a nonparametric estimation of φ independently for each of the r components of \(Y_{t}\): \({\varphi}(Z_{t})=({\varphi}_{1}(Z_{t}),\ldots ,{\varphi}_{r}(Z_{t}))\). Furthermore, we assume that the functions \(\varphi_{k}\) are additive with component functions \(\varphi_{k}^{j}\), for \(k=1,\ldots,r\) and \(j=0,\ldots,q\), thus
Therefore, r additive models with q covariates are estimated using the smooth backfitting technique. We have to take into account that the process \(\mathcal{E}_{t}\) is not observable since the function φ is not known. Thus, we have to replace \(\mathcal{E}_{t}\) by the residuals
and use these approximations to \(\mathcal{E}_{t}\) in the maximum likelihood estimations later defined.
To estimate the parametric part of the model, we must consider the two possible error structures proposed above:
 P1.:

The parameters \(\phi_{k}=(\phi_{k}^{1},\ldots,\phi_{k}^{p_{k}})\) of the error process \(\{\mathcal{E}_{k,t} \}\) are estimated by standard maximum likelihood methods. In particular, we use a conditional maximum likelihood estimator for every component of the form
$$\hat{\phi}_{k}=\operatorname{arg} \max_{\phi_{k}\in\Phi} \hat{l}(\phi_{k}), $$where Φ is a compact parameter space and l̂ is the conditional loglikelihood given by
$$\hat{l} \bigl(\phi_{k},\sigma_{k}^{2} \bigr)= \frac{T}{2}\log(2\pi)+\frac {1}{2}\log \bigl(\sigma_{k}^{2} \bigr)\frac{1}{2}\sum_{t=p_{k}+1}^{T} \bigl( \bigl(\hat{\mathcal{E}}_{k,t}\hat{\mathcal {E}}_{k,t}( \phi_{k}) \bigr)/\sigma_{k} \bigr)^{2} $$with \(\hat{\mathcal{E}}_{k,t}(\phi_{k})=\sum_{i=1}^{p_{k}}{\phi _{k}^{i}\hat{\mathcal{E}}_{k,ti}}\).
 P2.:

The coefficients matrices \((\Phi_{1},\ldots,\Phi_{p})\) of the rdimensional error process \(\{\mathcal{E}_{t} \}\) are also estimated by generalized maximum likelihood methods (Hamilton [10]). First, we need to establish the following notation: \(\Phi^{t} = [ \Phi_{1}\:\Phi_{2}\: \ldots\: \Phi_{p} ]\) denote the \((r\times rp)\) coefficients matrix, let \(X_{t}\) be a \((rp\times1)\) vector containing p lags of each of the elements of \(\mathcal{E}_{t}\): \(X_{t}^{t}=[\mathcal {E}^{t}_{t1}\:\mathcal{E}^{t}_{t2}\:\ldots\:\mathcal{E}^{t}_{tp}]\).
The theoretical conditional loglikelihood function to be optimized has the following expression:
$$l(\Phi, \Sigma_{\xi})=\frac{rT}{2}\log(2\pi)+\frac{r}{2} \log \bigl\vert \Sigma_{\xi}^{1} \bigr\vert  \frac{1}{2}\sum_{t=1}^{T}{ \bigl[ \bigl(\mathcal{E}_{t}\Phi^{t}X_{t} \bigr)^{t}\:\Sigma_{\xi}^{1} \bigl( \mathcal{E}_{t}\Phi^{t}X_{t} \bigr) \bigr]}. $$Thus the conditional loglikelihood is:
$$\hat{l}(\hat{\Phi}, \hat{\Sigma}_{\xi})=\frac{rT}{2}\log(2\pi )+ \frac{r}{2}\log \bigl\vert \hat{\Sigma}_{\xi}^{1} \bigr\vert \frac {1}{2}\sum_{t=1}^{T}{ \bigl[ \bigl(\hat{\mathcal{E}}_{t}\hat{\Phi}^{t}\hat {X}_{t} \bigr)^{t} \: \hat{\Sigma}_{\xi}^{1} \bigl(\hat{\mathcal {E}}_{t}\hat{\Phi}^{t} \hat{X}_{t} \bigr) \bigr]}. $$
Other considerations: the phenomenon of cointegration
Sometimes the vectorial processes can be cointegrated, so one has to take into account the structure of correlation between the series. The notion of cointegration has been one of the most important concepts in time series since Granger [9] and Engle and Granger [3] that formally developed it. The issue has broad applications in the analysis of economic data as well as several publications in the economic literature.
Let \(Y_{t}=(Y_{1,t},\ldots,Y_{r,t})^{t}\) be a vector of r time series integrated of order 1 (\(I(1)\)). \(Y_{t}\) is said to be cointegrated if a linear combination of them exists that it is stationary (\(I(0)\)), i.e., if there exists a vector \(\beta=(\beta _{1},\ldots,\beta_{r})^{t}\) such as
The vector β is called the cointegration vector. This vector is not unique since for any scalar c the linear combination \(c\beta^{t}Y_{t}=\beta^{*t}Y_{t}\sim I(0)\). Therefore, normalization is often assumed to identify an unique β. A typical normalization is \(\beta=(1,\beta_{2},\ldots,\beta_{r})^{t}\).
Johansen [11] addresses the issue of the cointegration within an error correction model in the framework of vector autoregressive models (VAR). Consider then a general model VAR(p) for the vector of r series \(Y_{t}\)
where \(D_{t}\) contains deterministic terms (constant, trend, …).
Suppose \(Y_{t}\) is \(I(1)\) and possibly cointegrated. Then, the VAR representation is not the most suitable representation for analysis because the cointegrating relationships are not explicitly apparent. The cointegrating relationships become apparent if the VAR model is transformed to a vector error correction model of order p (VECM(p))
where \(\Pi=\Phi_{1}+\cdots+\Phi_{p}I_{r}\), \(\Gamma_{k}=\sum_{j=k+1}^{p}{\Phi_{j}}\), \(k=1,\ldots,p1\) and \(\Delta Y_{t}=Y_{t}Y_{t1}\). The matrix Π is called the longrun impact matrix and \(\Gamma_{k}\) are the shortrun impact matrices. Moreover, the rank of the singular matrix Π provides information on the number of cointegration relations that exist, i.e., the rank of cointegration. Johansen proposes a sequential procedure of likelihood ratio tests to estimate this range.
Prediction scheme
We present now the prediction scheme step by step:

1.
Every instant t, \(\varphi_{\kappa}(Z_{t})\) is estimated with the smooth backfitting technique independently for each of r components using the data \((Y_{l},Z_{l\kappa})\), \(l=\kappa+1,\ldots,T\).

2.
The residuals series \(\hat{\mathcal{E}}_{t+\kappa}\) is computed by
$$\hat{\mathcal{E}}_{t+\kappa}=Y_{t+\kappa}\hat{\varphi}_{\kappa }(Z_{t}), \quad t=1,\ldots,T\kappa. $$ 
3.
The following step is to make an appropriate adjustment on the model error structure (VECM) and to obtain the prediction κ instants ahead: \(\dot{\mathcal{E}}_{T+\kappa}\).

4.
The proposed final prediction is given by (4).
This scheme is a natural generalization of the onedimensional prediction models described in Sect. 2.1.1. In the next two sections simulation examples and real data analysis are considered.
Results and discussion
A simulation study
To analyze the behavior of the proposed prediction procedure, a simulation study has been performed generating samples from artificial series and making a prediction study to k lags using, in all cases, \(Z_{t}=Y_{t1}\).
The following models are considered:
 Series 1. :

Two independent AR(3) with constant trend:
$$Y_{t}=\varphi+ \begin{pmatrix} \mathcal{E}_{1,t}\\ \mathcal{E}_{2,t} \end{pmatrix}, $$being
$$\begin{aligned} &\mathcal{E}_{1,t}=0.50\mathcal{E}_{1,t1}0.525\mathcal {E}_{1,t2}+0.75\mathcal{E}_{1,t3}+\eta_{1,t}, \\ &\mathcal{E}_{2,t}=0.1875\mathcal{E}_{2,t1}0.50\mathcal {E}_{2,t2}+0.05\mathcal{E}_{2,t3}+\eta_{2,t}, \end{aligned}$$where \(\eta_{1,t}\sim N (0,0.25^{2} )\), \(\eta_{2,t}\sim N (0,0.10^{2} )\) and \(\varphi=(25,10)^{t}\).
 Series 2. :

VAR(3) with constant trend:
$$Y_{t}=\varphi+\Pi_{1}(Y_{t1}\varphi)+ \Pi_{2}(Y_{t2}\varphi)+\Pi _{3}(Y_{t3} \varphi)+\eta_{t}, $$being ${\mathrm{\Pi}}_{1}=\left(\begin{array}{cc}0.50& 0.3150\\ 0.75& 0.1875\end{array}\right)$, ${\mathrm{\Pi}}_{2}=\left(\begin{array}{cc}0.525& 0\\ 0& 0.50\end{array}\right)$, ${\mathrm{\Pi}}_{3}=\left(\begin{array}{cc}0.75& 0.375\\ 0.50& 0.050\end{array}\right)$, ${\eta}_{t}\sim {N}_{2}(\left(\begin{array}{c}0\\ 0\end{array}\right),\left(\begin{array}{cc}{0.25}^{2}& 0\\ 0& {0.10}^{2}\end{array}\right))$ and φ as in Series 1.
 Series 3. :

NPVAR(1) with independent VAR(3) noise:
$$Y_{t}=\varphi (Y_{t1} )+\mathcal{E}_{t}, $$being $\phi (y)=\left(\begin{array}{c}{\phi}_{1}(y)\\ {\phi}_{2}(y)\end{array}\right)=\left(\begin{array}{c}5cos({y}_{1})\\ 5cos({y}_{2})\end{array}\right)$ and \(\mathcal{E}_{t}=\Pi_{1}\mathcal{E}_{t1}+\Pi_{2}\mathcal {E}_{t2}+\Pi_{3}\mathcal{E}_{t3}+\eta_{t}\), where \(\Pi_{1}\), \(\Pi_{2}\), \(\Pi_{3}\) and \(\eta_{t}\) are similar to those of the previous series.
 Series 4. :

VECM with constant trend:
$$Y_{t}=\varphi+ \begin{pmatrix} Y_{1,t}\\ Y_{2,t} \end{pmatrix}, $$being \(Y_{1,t}=Y_{1,t1}+v_{t}\) and \(Y_{2,t}=Y_{1,t}+u_{t}\), where \(v_{t}\sim N (0,0.5^{2} )\), \(u_{t}=0.75 u_{t1}+\eta_{t}\), \(\eta _{t}\sim N (0,0.5^{2} )\) and φ is similar to that of the first series.
In each case, 1000 bidimensional series of length 500 were generated from the models given above (\(Y_{1}^{i},\ldots,Y_{500}^{i}\) with \(1\leq i\leq1000\)). These values correspond to the generation after an initial period of stabilization (starting at zero and neglecting the first 500 values drawn). For every sample, \(M=500\) possible continuations of the series were obtained for k periods ahead \((Y_{500+k}^{i1},\ldots,Y_{500+k}^{i500})\), which were compared with the prediction that was made from the sample \(Y_{1}^{i},\ldots,Y_{500}^{i}\).
For each of these series, three predictors are compared:

(a)
The nonparametric predictor using additive models with the estimation of each component independently (NPM).

(b)
The semiparametric predictor for additive models to estimate the trend of each bidimensional series component independently with model P1 for the residuals (SPM).

(c)
The semiparametric predictor for additive models to estimate the trend of each bidimensional series component independently with VAR modelling for the vector residuals proposed in the previous section as P2 (SPBM).
Thus, as noted above, by calling \(Y_{1}^{i},\ldots,Y_{500}^{i}\), \(i=1,\ldots ,N=1000\), each of the simulated series and, considering \(\hat {Y}_{500+k}^{i(a)}\), \(\hat{Y}_{500+k}^{i(b)}\) and \(\hat {Y}_{500+k}^{i(c)}\), \(k=1,\ldots,30\) as each of the predictors according to the methods (a), (b) and (c) respectively, methods are compared using Mean Square Prediction Errors:
where \(Y_{500+k}^{ij}\) represents the observed value of the jth prolongation of the ith series, \(j=1,\ldots,M=500\), \(l=a, b\) or c and \(k=1,\ldots,30\).
The results are summarized in Tables 1 to 4. “\(\operatorname{MSPE}(a,b,c)\)” means the mean square prediction error (6) for the methods (a), (b) and (c), respectively. It can be seen that the proposed semiparametric method improves the behavior the other two, specially in the first lags and as the lags grow the differences between the three methods become smaller. This is illustrated in Fig. 3 which compares the distribution of the prediction errors obtained with three predictors for the second model.
Real data application
The general model proposed in Sect. 3.1 was implemented for the particular case of the prediction of levels of \(\mathrm{SO}_{2}\) and \(\mathrm{NO}_{x}\) in the vicinity of power station and combined cycle.
Let \(X_{t}\) be the bidimensional series formed by the one hour mean series of \(\mathrm{SO}_{2}\) and \(\mathrm{NO}_{x}\) at each minute t. In terms of equation (3), we consider \(Y_{t}=X_{t+\kappa}\) and \(Z_{t}= (X_{t},X_{t}X_{t5} )\). If \(\hat{X}_{i}\) denotes the observed values for past instants (\(i\leq t\)) and the best prediction for future instants (\(i>t\)), the aim is to predict \(X_{t+30}\) following the next algorithm:

Every instant t, \(\varphi (Z_{t} )\) is estimated with additive models and the information provided by the historical matrix, independently for each component. The estimate of φ is done at 30 instants ahead: \(\dot{Y}_{t}=\dot{X}_{t+30}=\hat{\varphi }_{30} (Z_{t} )+\dot{e}_{t+30}\).

The residuals series \(\hat{e}_{t}\) is computed by \(\hat {e}_{t}=Y_{t}\hat{\varphi}_{30} (Z_{t} )\) and a test of model adequacy is performed (for instance, the Ljung–Box test) for each component of the series concerning the last four hours (240 observations).

If any of the components of the residuals series is not white noise, a test is performed to explore if the vectorial residual series is cointegrated. If this is the case, an adequate VECM is adjusted. If the series is not cointegrated, a VAR model is fitted.

Thus \(\dot{e}_{t+30}\) is obtained.

The proposed final prediction given by the Semiparametric Bidimensional Model with the nonparametric part estimated at 30 instants (SPBM) is:
$$\dot{X}_{t+30}=\hat{\varphi}_{30} (Z_{t} )+ \dot{e}_{t+30}. $$
To observe the behaviour of the prediction model, we have evaluated its performance on two episodes of air quality alteration, whose information has not been included in the historical matrix.
Figure 4 shows the forecasts given half an hour before by the proposed models for an episode depicted in one of sampling stations. The good behaviour of the forecasts can easily be seen. The proposals estimate quite well the real one hour mean of \(\mathrm{SO}_{2}\) and \(\mathrm{NO}_{x}\) values. This is confirmed in Table 5. This table contains three measures of accuracy for the pure nonparametric predictor (NPM) and the proposed semiparametric predictor (SPBM), based on the following criteria:

(a)
Squared error: \(\textrm{SE}=\sum_{t} (y_{t}\hat {y}_{t} )^{2}\).

(b)
Absolute error: \(\textrm{AE}= \vert y_{t}\hat{y}_{t} \vert \).

(c)
Relative absolute error (%): \(\textrm{RAE}=100{ \vert \frac{y_{t}\hat{y}_{t}}{y_{t}} \vert }\).
The mean (M) and the median (Md) of these three measures have been computed for the period covering the pollution incident proper (02.00 to 10.00 hours). The \(\mathrm{SO}_{2}\) nonparametric prediction with the historical matrix captures very well the behaviour of the real series (RAE: 24.18%) while the semiparametric prediction is not able to overcome (RAE: 27.15%). However, the \(\mathrm{NO}_{x}\) prediction given by SPBM (RAE: 21.35%) notably improves one obtained by the NPM (RAE: 29.77%). Furthermore, the residuals series was detected as cointegrated 123 times (8.37%), mainly when the episode higher values occur.
In another \(\mathrm{SO}_{2}\) episode depicted in one of the sampling stations (see Fig. 5) the behaviour of the predictors is somewhat different. The \(\mathrm{SO}_{2}\) prediction given by NPM (RAE: 43.92%) does not entirely capture the behaviour of the real series and so, the semiparametric prediction (RAE: 38.48%) can improve that results as shown in Table 6. In this episode, the \(\mathrm {NO}_{x}\) values are very low (practically zero) and therefore there are no cointegration relationships.
Conclusions
This paper reviews several prediction models that have been implemented along the years for the prediction of \(\mathrm{SO}_{2}\) in the vicinity of a power station. This evolution reflects the adaptation of the statistical models to the change of improved environmental rules and the availability of new technological resources that allows the estimation in more complex situations.
The last part of the paper is devoted to a new proposal that, having in mind the same philosophy applied to the previous univariate models, extends the semiparametric model to the multivariate framework. In particular, the paper deals with the joint prediction of \(\mathrm {SO}_{2}\) and \(\mathrm{NO}_{x}\) levels using natural extensions of the model in the univariate framework. These models, originally developed for financial applications, are successfully adapted to the environmental problem showing good results in the simulation studies and in the real data application. The semiparametric joint predictor (SPBM) obtains similar results as the nonparametric (NPM) and the semiparametric independent predictor (SPM) in those scenarios where the components of the response are not related (see Table 1). Recall that predictors NPM and SPM are constructed under this assumption. In the scenarios with dependence among components, predictor SPBM clearly beats its competitors (see Tables 2–4) showing also good results in the real data application.
Abbreviations
 SIPEI:

Integrated System of Statistical Prediction of the Inmision
 ARIMA:

Autoregressive Integrated Moving Average
 VECM:

Vector Error Correction Model
 NPM:

Nonparametric Model
 SPM:

Semiparametric Model
 SPBM:

Semiparametric Bidimensional Model
 MSPE:

Mean Square Prediction Error
References
 1.
Box G, Jenkins M, Reinsel C. Time series analysis: forecasting and control. New York: Wiley; 2008.
 2.
Buja A, Hastie T, Tibshirani R. Linear smoothers and additive models. Ann Stat. 1989;17:453–510.
 3.
Engle RF, Granger CWJ. Cointegration and error correction: representation, estimation and testing. Econometrica. 1987;57:251–76.
 4.
Fernández de Castro B, GonzálezManteiga W. Boosting for real and functional samples: an application to an environmental problem. Stoch Environ Res Risk Assess. 2008;22(1):27–37.
 5.
Fernández de Castro B, Guillas S, GonzálezManteiga W. Functional samples and bootstrap for predicting sulfur dioxide levels. Technometrics. 2005;47(2):212–22.
 6.
Fernández de Castro B, PradaSánchez J, GonzálezManteiga W, FebreroBande M, Bermúdez Cela J, Hernández Fernández J. Prediction of SO2 levels using neural networks. J Air Waste Manage Assoc. 2003;53(5):532–9.
 7.
Friedman J, Stuetzle W. Projection pursuit regression. J Am Stat Assoc. 1981;76(376):817–23.
 8.
GarcíaJurado I, GonzálezManteiga W, PradaSánchez J, FebreroBande M, Cao R. Predicting using Box–Jenkins, nonparametric, and bootstrap techniques. Technometrics. 1995;37(3):303–10.
 9.
Granger C. Cointegrated variables and errorcorrecting models. PhD thesis, Discussion Paper 8313. Department of Economics, University of California at San Diego; 1983.
 10.
Hamilton JD. Time series analysis. vol. 2. Princeton: Princeton University Press; 1994.
 11.
Johansen S. Statistical analysis of cointegration vectors. J Econ Dyn Control. 1988;12(2):231–54.
 12.
Mammen E, Linton O, Nielsen J. The existence and asymptotic properties of a backfitting projection algorithm under weak conditions. Ann Stat. 1999;27(5):1443–90.
 13.
Nadaraya EA. On estimating regression. Theory Probab Appl. 1964;9(1):141–2.
 14.
PradaSánchez J, FebreroBande M. Parametric, nonparametric and mixed approaches to prediction of sparsely distributed pollution incidents: a case study. J Chemom. 1997;11(1):13–32.
 15.
PradaSánchez J, FebreroBande M, CotosYáñez T, GonzálezManteiga W, BermúdezCela J, LucasDomínguez T. Prediction of SO2 pollution incidents near a power station using partially linear models and an historical matrix of predictorresponse vectors. Environmetrics. 2000;11(2):209–25.
 16.
RocaPardiñas J, CadarsoSuárez C, GonzálezManteiga W. Testing for interactions in generalized additive models: application to SO2 pollution data. Stat Comput. 2005;15(4):289–99.
 17.
RocaPardiñas J, GonzálezManteiga W, FebreroBande M, PradaSánchez J, CadarsoSuárez C. Predicting binary time series of SO2 using generalized additive models with unknown link function. Environmetrics. 2004;15(7):729–42.
 18.
Speckman P. Kernel smoothing in partial linear models. J R Stat Soc, Ser B, Stat Methodol. 1988;50:413–36.
 19.
Watson GS. Smooth regression analysis. Sankhya, Ser A. 1964;26:359–72.
Acknowledgements
The work by Wenceslao GonzálezManteiga and Manuel FebreroBande was partially supported by projects MTM201341383P and MTM201676969P from the Spanish Ministry of Science and Innovation and European Regional Development Fund and IAP network StUDyS from Belgian Science Policy.
Availability of data and materials
Please contact authors for data requests.
Funding
Endesa Generation S.A. has signed along the last 20 years several contracts with academic departments for the development of an integral program for contamination prevention around the facility. The three authors have been involved in some of those contracts under different roles.
Author information
Author notes
Affiliations
Contributions
The three authors are equally contributors to this paper. All authors read and approved the final manuscript.
Corresponding author
Correspondence to Manuel FebreroBande.
Ethics declarations
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Wenceslao GonzálezManteiga, Manuel FebreroBande and María PiñeiroLamas contributed equally to this work.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Received
Accepted
Published
DOI
Keywords
 Semiparametric prediction models
 Pollution indicators
 Cointegration