Keywords

1 Introduction

In this paper we introduce a doubly robust approach to modelling the volatility of electricity spot prices, minimizing the misleading effects of the extreme jumps that characterize this particular kind of data on the predictions. The purpose of this paper is twice: first, it is possible to enhance the prediction from point to intervals with associated probability levels, second, we can detect possible extreme prices which are commonly observed in electricity markets. Although many papers have applied quite sophisticated time series models to prices and demand time series of electricity and gas, very few have considered the strong influence of jumps on estimates and the need to move to robust estimators [4]. Thus, the big differences between our paper and the previous literature related to jumps in electricity prices are twofold: (1) we don’t focus on the prediction of jumps; (2) we apply robust estimators which are not strongly influenced by the jumps. In this way we improve the prediction of normal prices which represent the majority of data. Our approach could then be integrated with other methods dealing with jumps forecasting.

Our methodology is applied separately to the hourly time series of Italian day-ahead electricity price data from January 1st, 2013 to December 31th, 2015. In the first step, we apply a threshold autoregressive model (SETAR) to the time series of logarithmic prices. In the second step the weighted forward search estimator (WFSE) for GARCH(1,1) models is applied to the residuals extracted from the first step in order to estimate and forecast volatility. The weighted forward search is a modification of the Forward Search intending to correct the effects of extreme observations on the estimates of GARCH(1,1) models. Differently from the original Forward Search, at each step of the search estimation involves all observations which are weighted according to their degree of outlyingness.

2 A Robust SETAR Model for Electricity Prices

It is well-known that the series of electricity prices have long-run behaviour and annual dynamics, which change according with the load period. A common characteristic of price time series is the weekly periodic component (of period 7), suggested by the spectra that show three peaks at the frequencies 1∕7, 2∕7 and 3∕7, and a very persistent autocorrelation function.

We assume that the dynamics of log prices can be represented by a nonstationary level component L tj, accounting for level changes and/or long-term behaviour, and a residual stationary component p tj, formally:

$$\displaystyle \begin{aligned} \log P_{tj}=L_{tj}+p_{tj}. \end{aligned} $$
(1)

To estimate L tj we use the wavelets approach [3]. We consider the Daubechies least asymmetric wavelet family, LA(8), and the coefficients were estimated via the maximal overlap discrete wavelet transform (MODWT) method (for details, see Percival and Walden [5]).

The detrended prices p t will then be modeled by a two-regime Self-Exciting Threshold AutoRegressive model SETAR(7,1) which is specified as

$$\displaystyle \begin{aligned} p_t= \begin{cases} {\mathbf{x}}_{t} \boldsymbol{\beta}_1+\varepsilon_{t}, \hspace{0.5cm} \mathrm{if} \hspace{0.5cm} p_{t-1}\leq \gamma \hspace{0.5cm} \mathrm{and} \hspace{0.5cm} {\mathbf{x}}_{t}\boldsymbol{\beta}_2+\varepsilon_{t} \hspace{0.5cm} \mathrm{if} \hspace{0.5cm} p_{t-1}> \gamma \end{cases} \end{aligned} $$
(2)

for t = 1, …, N, where p t−1 is the threshold variable and γ is the threshold value. The relation between p t−1 and γ states if p t is observed in regime 1 or 2. β j is the parameter vector for regime j = 1, 2 containing 7 coefficients to account for weekly periodicity. x t is the t-th row of the (N × 7) matrix X comprising 7 lagged variables of p t (and eventually a constant). Errors ε t are assumed to follow an iid(0, σ ε) distribution.

Parameters can be estimated by sequential conditional least squares. In the case of robust two-regime SETAR model, for a fixed threshold γ the GM estimate of the autoregressive parameters can be obtained by applying the iterative weighted least squares:

$$\displaystyle \begin{aligned} \hat{\boldsymbol{\beta}}_j^{(n+1)}=\left({\mathbf{X}}^{\prime}_j{\mathbf{W}}^{(n)}{\mathbf{X}}_j\right)^{-1}{\mathbf{X}}^{\prime}_j{\mathbf{W}}^{(n)}{\mathbf{p}}_j\, \end{aligned} $$
(3)

where \(\hat {\boldsymbol {\beta }}_j^{(n+1)}\) is the GM estimate for the parameter vector in regime j = 1, 2 after the n-th iteration from an initial estimate \(\hat {\boldsymbol {\beta }}_j^{(0)}\), and W (n) is a weight diagonal matrix, those elements depend on a weighting function \(w(\hat {\boldsymbol {\beta }}_j^{(n)},\hat {\sigma }_{\varepsilon ,j}^{(n)})\) bounded between 0 and 1. The GM weights are presented in Schweppe’s form \(w(\hat {\boldsymbol {\beta }}_j,\hat {\sigma }_{\varepsilon ,j})=\psi (r_t)/r_t\) with standardized residuals \(r_t=(p_t-{\mathbf {x}}_{t}\hat {\boldsymbol {\beta }}_j)/(\hat {\sigma }_{\varepsilon ,j}w({\mathbf {x}}_{t}))\) and w(x t) = ψ(d(x t)α)∕d(x t)α. \(\mathrm {d}({\mathbf {x}}_{t})=|{\mathbf {x}}_{t}-m_{p,j}|/\hat {\sigma }_{p,j}\) is the Mahalanobis distance and α is a constant usually set equal to 2 to obtain robustness of standard errors. The weight function is the Polynomial ψ function as in Grossi and Nan [2]. The threshold γ is estimated by minimizing the objective function \(\sum _{t=1}^N w(\hat {\boldsymbol {\beta }},\hat {\sigma }_{\varepsilon })(p_t-{\mathbf {x}}_{t}\hat {\boldsymbol {\beta }})^2\) over the set Γ of allowable threshold values.

3 Robust Volatility Estimation Through the Forward Search

Now we apply the weighted forward search (WFS) estimator to derive robust prediction intervals for the volatility of electricity prices, starting from the residuals of the SETAR model estimated in Sect. 1. Let 𝜖 t denote an observed time series of heteroscedastic residuals. For electricity prices \(\epsilon _t = p_t-\hat {p_{t}}\) where \(\hat {p_t}\) is the price fitted by the robust SETAR model. We proceed now by estimating a standard GARCH(1,1) model on residuals 𝜖 t, so that \(\epsilon _t|F_{t-1} \sim N\left (0,\sigma _t^2\right )\) and

$$\displaystyle \begin{aligned} \sigma_t^2=\alpha_0+\alpha_1\epsilon_{t-1}^2+\beta\sigma_{t-1}^2\end{aligned} $$
(4)

with α 0 > 0, α 1 ≥ 0, β ≥ 0, α 1 + β < 1.

The WFS (see Crosato and Grossi [1], for details) starts with the selection of a small subset of observations showing the minimum median of squared residuals. Then, at each step of the search, MLE estimation is carried out on all observations weighted to account for their degree of outlyingness so that the estimation pattern along the search is not influenced by the presence of spikes. All units, a part from the first b, are ranked according to their squared standardized residuals with respect to the model estimated at the previous step of the search. Proceeding along the search, an increasing number of units are given weight 1 while the remaining are downweighted by the corresponding value of the complementary cumulative distribution function of the squared standardized residuals defined on the whole sample. The weights range from approximately 0 to 1 so that outliers will be heavily downweighted until the end of the search while the closer the weight to 1 the stronger is the degree of agreement of the observation with the estimated GARCH. This way the temporal structure of the time series is respected, filling the time gaps created by the forward search ordering but on the same time observations are ordered consistently with the GARCH model estimated until the last steps when all observations enter with their original value.

A plus of the WFS approach is that outliers are identified automatically through a test reducing the arbitrariness in declaring a given observation as outlier. Once outliers have been identified the forward plots of coefficient estimates are cut automatically and WFS GARCH estimates are obtained.

The outliers identified through the WFS test vary from a minimum of 29 (hour 21) to a maximum of 59 (hours 8 and 9). A few outliers characterize many hours of the same day, as is the case of the 21st and 27th of April, 2013 for 14 h as well as mot days of June 2013. Spikes in 2014 characterize mainly the months of February, March and April. Outliers for 2015, hours 14 and 16, are highlighted by red circles in Fig. 1.

Fig. 1
figure 1

Forecasted conditional confidence intervals of volatility for hourly electricity prices in Italy (from 13 to 18, January to December 2015) obtained by MLE (light blue) and WFS (dotted red)

Using the MLE and WFS estimated coefficients and applying Eq. (4) for the conditional variance recursively, we then obtain the forecasted price volatility and the corresponding 95% confidence intervals (see Fig. 1). As can be seen, the WFS intervals (dotted red) are tighter around the realized volatility than the MLE ones (light blue). The gain in prediction is evident in the aftermath of a spike or a drop, for instance in April and May. Note that the correction provided by the WFS works also for smallest jumps, as in July and December.