## ARIMA modelling in R## How does auto.arima() work ?The ## Hyndman-Khandakar algorithm for automatic ARIMA modellingThe number of differences d is determined using repeated KPSS tests. The values of p and q are then chosen by minimizing the AICc after differencing the data d times. Rather than considering every possible combination of p and q, the algorithm uses a stepwise search to traverse the model space. (a) The best model (with smallest AICc) is selected from the following four: ARIMA(2,d,2), ARIMA(0,d,0), ARIMA(1,d,0), ARIMA(0,d,1).If d=0 then the constant c is included; if d≥1 then the constant c is set to zero. This is called the "current model". (b) Variations on the current model are considered: - vary p and/or q from the current model by ±1;
- include/exclude c from the current model.
The best model considered so far (either the current model, or one of these variations) becomes the new current model. (c) Repeat Step 2(b) until no lower AICc can be found.
## Choosing your own modelIf you want to choose the model yourself, use the R code fit <- Arima(usconsumption[,1], order=c(0,0,3)) There is another function ## Modelling procedureWhen fitting an ARIMA model to a set of time series data, the following procedure provides a useful general approach. - Plot the data. Identify any unusual observations.
- If necessary, transform the data (using a Box-Cox transformation) to stabilize the variance.
- If the data are non-stationary: take first differences of the data until the data are stationary.
- Examine the ACF/PACF: Is an AR(p) or MA(q) model appropriate?
- Try your chosen model(s), and use the AICc to search for a better model.
- Check the residuals from your chosen model by plotting the ACF of the residuals, and doing a portmanteau test of the residuals. If they do not look like white noise, try a modified model.
- Once the residuals look like white noise, calculate forecasts.
The automated algorithm only takes care of steps 3–5. So even if you use it, you will still need to take care of the other steps yourself. The process is summarized in Figure 8.10. ## Example 8.2 Seasonally adjusted electrical equipment ordersWe will apply this procedure to the seasonally adjusted electrical equipment orders data shown in Figure 8.11. R code eeadj <- seasadj(stl(elecequip, s.window="periodic")) plot(eeadj) - The time plot shows some sudden changes, particularly the big drop in 2008/2009. These changes are due to the global economic environment. Otherwise there is nothing unusual about the time plot and there appears to be no need to do any data adjustments.
- There is no evidence of changing variance, so we will not do a Box-Cox transformation.
- The data are clearly non-stationary as the series wanders up and down for long periods. Consequently, we will take a first difference of the data. The differenced data are shown in Figure 8.12. These look stationary, and so we will not consider further differences.R codetsdisplay(diff(eeadj),main="")
- The PACF shown in Figure 8.12 is suggestive of an AR(3) model. So an initial candidate model is an ARIMA(3,1,0). There are no other obvious candidate models.
- We fit an ARIMA(3,1,0) model along with variations including ARIMA(4,1,0), ARIMA(2,1,0), ARIMA(3,1,1), etc. Of these, the ARIMA(3,1,1) has a slightly smaller AICc value.R output> fit <- Arima(eeadj, order=c(3,1,1))
> summary(fit) Series: eeadj ARIMA(3,1,1) Coefficients: ar1 ar2 ar3 ma1 0.0519 0.1191 0.3730 -0.4542 s.e. 0.1840 0.0888 0.0679 0.1993 sigma^2 estimated as 9.532: log likelihood=-484.08 AIC=978.17 AICc=978.49 BIC=994.4 - The ACF plot of the residuals from the ARIMA(3,1,1) model shows all correlations within the threshold limits indicating that the residuals are behaving like white noise. A portmanteau test returns a large p-value, also suggesting the residuals are white noise.R codeAcf(residuals(fit))
Box.test(residuals(fit), lag=24, fitdf=4, type="Ljung") - Forecasts from the chosen model are shown in Figure 8.13.
R code plot(forecast(fit)) If we had used the automated algorithm instead, we would have obtained exactly the same model in this example. ## Understanding constants in RA non-seasonal ARIMA model can be written as (1−ϕ1B−⋯−ϕpBp)(1−B)dyt=c+(1+θ1B+⋯+θqBq)et(8.2) or equivalently as (1−ϕ1B−⋯−ϕpBp)(1−B)d(yt−μtd/d!)=(1+θ1B+⋯+θqBq)et,(8.3) where c=μ(1−ϕ1−⋯−ϕp) and μ is the mean of (1−B)dyt. R uses the parametrization of equation (8.3). Thus, the inclusion of a constant in a non-stationary ARIMA model is equivalent to inducing a polynomial trend of order din the forecast function. (If the constant is omitted, the forecast function includes a polynomial trend of order d−1.) When d=0, we have the special case that μis the mean of yt. ## arima()By default, the ## Arima()The There is also an argument ## auto.arima()The ## Forecasting## Point forecastsAlthough we have calculated forecasts from the ARIMA models in our examples, we have not yet explained how they are obtained. Point forecasts can be calculated using the following three steps. - Expand the ARIMA equation so that yt is on the left hand side and all other terms are on the right.
- Rewrite the equation by replacing t by T+h.
- On the right hand side of the equation, replace future observations by their forecasts, future errors by zero, and past errors by the corresponding residuals.
Beginning with h=1, these steps are then repeated for h=2,3,… until all forecasts have been calculated. The procedure is most easily understood via an example. We will illustrate it using the ARIMA(3,1,1) model fitted in the previous section. The model can be written as follows (1−ϕ1B−ϕ2B2−ϕ3B3)(1−B)yt=(1+θ1B)et, where ϕ1=0.0519, ϕ2=0.1191, ϕ3=0.3730 and θ1=−0.4542. Then we expand the left hand side to obtain [1−(1+ϕ1)B+(ϕ1−ϕ2)B2+(ϕ2−ϕ3)B3+ϕ3B4]yt=(1+θ1B)et, and applying the backshift operator gives yt−(1+ϕ1)yt−1+(ϕ1−ϕ2)yt−2+(ϕ2−ϕ3)yt−3+ϕ3yt−4=et+θ1et−1. Finally, we move all terms other than yt to the right hand side: yt=(1+ϕ1)yt−1−(ϕ1−ϕ2)yt−2−(ϕ2−ϕ3)yt−3−ϕ3yt−4+et+θ1et−1.(8.2) This completes the first step. While the equation now looks like an ARIMA(4,0,1), it is still the same ARIMA(3,1,1) model we started with. It cannot be considered an ARIMA(4,0,1) because the coefficients do not satisfy the stationarity conditions. For the second step, we replace t by T+1 in (8.2): yT+1=(1+ϕ1)yT−(ϕ1−ϕ2)yT−1−(ϕ2−ϕ3)yT−2−ϕ3yT−3+eT+1+θ1eT. Assuming we have observations up to time T, all values on the right hand side are known except for eT+1 which we replace by zero and eT which we replace by the last observed residual e^T: y^T+1|T=(1+ϕ1)yT−(ϕ1−ϕ2)yT−1−(ϕ2−ϕ3)yT−2−ϕ3yT−3+θ1e^T. A forecast of yT+2 is obtained by replacing t by T+2 in (8.2). All values on the right hand side will be known at time T except yT+1 which we replace by y^T+1|T, and eT+2 and eT+1, both of which we replace by zero: y^T+2|T=(1+ϕ1)y^T+1|T−(ϕ1−ϕ2)yT−(ϕ2−ϕ3)yT−1−ϕ3yT−2. The process continues in this manner for all future time periods. In this way, any number of point forecasts can be obtained. ## Forecast intervalsThe calculation of ARIMA forecast intervals is much more difficult, and the details are largely beyond the scope of this book. We will just give some simple examples. The first forecast interval is easily calculated. If σ^ is the standard deviation of the residuals, then a 95% forecast interval is given by y^T+1|T±1.96σ^. This result is true for all ARIMA models regardless of their parameters and orders. Multi-step forecast intervals for ARIMA(0,0,q) models are relatively easy to calculate. We can write the model as yt=et+∑i=1qθiet−i. Then the estimated forecast variance can be written as vT+h|T=σ^2[1+∑i=1h−1θ2i],for h=2,3,…, and a 95% forecast interval is given by y^T+h|T±1.96vT+h|T−−−−−√. In Section 8/4, we showed that an AR(1) model can be written as an MA(∞) model. Using this equivalence, the above result for MA(q) models can also be used to obtain forecast intervals for AR(1) models. More general results, and other special cases of multi-step forecast intervals for an ARIMA(p,d,q) model, are given in more advanced textbooks such asBrockwell and Davis (1991, Section 9.5). The forecast intervals for ARIMA models are based on assumptions that the residuals are uncorrelated and normally distributed. If either of these are assumptions do not hold, then the forecast intervals may be incorrect. For this reason, always plot the ACF and histogram of the residuals to check the assumptions before producing forecast intervals. In general, forecast intervals from ARIMA models will increase as the forecast horizon increases. For stationary models (i.e., with d=0), they will converge so forecast intervals for long horizons are all essentially the same. For d>1, the forecast intervals will continue to grow into the future. As with most forecast interval calculations, ARIMA-based intervals tend to be too narrow. This occurs because only the variation in the errors has been accounted for. There is also variation in the parameter estimates, and in the model order, that has not been included in the calculation. In addition, the calculation assumes that the historical patterns that have been modelled will continue into the forecast period. |