Trang chủ‎ > ‎IT‎ > ‎Data Mining‎ > ‎Time Series Analysis‎ > ‎ARIMA, AR, MA ...‎ > ‎

# Non-seasonal ARIMA models

If we combine differencing with autoregression and a moving average model, we obtain a non-seasonal ARIMA model. ARIMA is an acronym for AutoRegressive Integrated Moving Average model (“integration” in this context is the reverse of differencing). The full model can be written as

yt=c+ϕ1yt1++ϕpytp+θ1et1++θqetq+et,(8.1)(8.1)yt′=c+ϕ1yt−1′+⋯+ϕpyt−p′+θ1et−1+⋯+θqet−q+et,

where ytyt′ is the differenced series (it may have been differenced more than once). The “predictors” on the right hand side include both lagged values of ytytand lagged errors. We call this an ARIMA(p,d,qp,d,q) model, where

p=p= order of the autoregressive part;
d=d= degree of first differencing involved;
q=q= order of the moving average part.

The same stationarity and invertibility conditions that are used for autoregressive and moving average models apply to this ARIMA model.

Once we start combining components in this way to form more complicated models, it is much easier to work with the backshift notation. Then equation (8.18.1) can be written as

(1ϕ1BϕpBp)AR(p)(1B)dytd differences=c+(1+θ1B++θqBq)etMA(q)(1−ϕ1B−⋯−ϕpBp)(1−B)dyt=c+(1+θ1B+⋯+θqBq)et↑↑↑AR(p)d differencesMA(q)

Selecting appropriate values for ppdd and qq can be difficult. The auto.arima()function in R will do it for you automatically. Later in this chapter, we will learn how the function works, and some methods for choosing these values yourself.

Many of the models we have already discussed are special cases of the ARIMA model as shown in the following table.

White noiseARIMA(0,0,0)
Random walkARIMA(0,1,0) with no constant
Random walk with driftARIMA(0,1,0) with a constant
AutoregressionARIMA(p,0,0)
Moving averageARIMA(0,0,q)

## Example 8.1 US personal consumption

Figure 8.7 shows quarterly percentage changes in US consumption expenditure. Although it is quarterly data, there appears to be no seasonal pattern, so we will fit a non-seasonal ARIMA model.

Figure 8.7: Quarterly percentage change in US consumption expenditure.

The following R code was used to automatically select a model.

R output
> fit <- auto.arima(usconsumption[,1],seasonal=FALSE)

ARIMA(0,0,3) with non-zero mean
Coefficients:
ma1     ma2     ma3  intercept
0.2542  0.2260  0.2695     0.7562
s.e.  0.0767  0.0779  0.0692     0.0844

sigma^2 estimated as 0.3856:  log likelihood=-154.73
AIC=319.46   AICc=319.84   BIC=334.96

This is an ARIMA(0,0,3) or MA(3) model:

yt=0.756+et+0.254et1+0.226et2+0.269et3,yt=0.756+et+0.254et−1+0.226et−2+0.269et−3,
where etet is white noise with standard deviation 0.62=0.38560.62=0.3856. Forecasts from the model are shown in Figure 8.8.
Figure 8.8: Forecasts of quarterly percentage change in US consumption expenditure.
R code
plot(forecast(fit,h=10),include=80)

## Understanding ARIMA models

The auto.arima() function is very useful, but anything automated can be a little dangerous, and it is worth understanding something of the behaviour of the models even when you rely on an automatic procedure to choose the model for you. The constant cc has an important effect on the long-term forecasts obtained from these models.

• If c=0c=0 and d=0d=0, the long-term forecasts will go to zero.
• If c=0c=0 and d=1d=1, the long-term forecasts will go to a non-zero constant.
• If c=0c=0 and d=2d=2, the long-term forecasts will follow a straight line.
• If c0c≠0 and d=0d=0, the long-term forecasts will go to the mean of the data.
• If c0c≠0 and d=1d=1, the long-term forecasts will follow a straight line.
• If c0c≠0 and d=2d=2, the long-term forecasts will follow a quadratic trend.

The value of dd also has an effect on the prediction intervals — the higher the value of dd, the more rapidly the prediction intervals increase in size. For d=0d=0, the long-term forecast standard deviation will go to the standard deviation of the historical data, so the prediction intervals will all be essentially the same.

This behaviour is seen in Figure 8.8 where d=0d=0 and c0c≠0. In this figure, the prediction intervals are the same for the last few forecast horizons, and the point forecasts are equal to the mean of the data.

The value of pp is important if the data show cycles. To obtain cyclic forecasts, it is necessary to have p2p≥2 along with some additional conditions on the parameters. For an AR(2) model, cyclic behaviour occurs if ϕ21+4ϕ2<0ϕ12+4ϕ2<0. In that case, the average period of the cycles is 1

2πarc cos(ϕ1(1ϕ2)/(4ϕ2)).2πarc cos(−ϕ1(1−ϕ2)/(4ϕ2)).

## ACF and PACF plots

It is usually not possible to tell, simply from a time plot, what values of pp and qqare appropriate for the data. However, it is sometimes possible to use the ACF plot, and the closely related PACF plot, to determine appropriate values for pp and qq.

Recall that an ACF plot shows the autocorrelations which measure the relationship between ytyt and ytkyt−k for different values of kk. Now if ytyt and yt1yt−1 are correlated, then yt1yt−1 and yt2yt−2 must also be correlated. But then ytyt and yt2yt−2might be correlated, simply because they are both connected to yt1yt−1, rather than because of any new information contained in yt2yt−2 that could be used in forecasting ytyt.

To overcome this problem, we can use partial autocorrelations. These measure the {relationship} between ytyt and ytkyt−k after removing the effects of other time lags -- 1,2,3,,k11,2,3,…,k−1. So the first partial autocorrelation is identical to the first autocorrelation, because there is nothing between them to remove. The partial autocorrelations for lags 2, 3 and greater are calculated as follows:

αk=kth partial autocorrelation coefficient=the estimate of ϕk in the autoregression modelyt=c+ϕ1yt1+ϕ2yt2++ϕkytk+et.αk=kth partial autocorrelation coefficient=the estimate of ϕk in the autoregression modelyt=c+ϕ1yt−1+ϕ2yt−2+⋯+ϕkyt−k+et.

Varying the number of terms on the right hand side of this autoregression model gives αkαk for different values of kk. (In practice, there are more efficient algorithms for computing αkαk than fitting all these autoregressions, but they give the same results.)

Figure 8.9 shows the ACF and PACF plots for the US consumption data shown in Figure 8.7.

The partial autocorrelations have the same critical values of ±1.96/T±1.96/T as for ordinary autocorrelations, and these are typically shown on the plot as in Figure 8.9.

Figure 8.9: ACF and PACF of quarterly percentage change in US consumption. A convenient way to produce a time plot, ACF plot and PACF plot in one command is to use the tsdisplay function in R.
R code
par(mfrow=c(1,2))
Acf(usconsumption[,1],main="")
Pacf(usconsumption[,1],main="")

If the data are from an ARIMA(p,d,0p,d,0) or ARIMA(0,d,q0,d,q) model, then the ACF and PACF plots can be helpful in determining the value of pp or qq. If both pp and qq are positive, then the plots do not help in finding suitable values of pp and qq.

The data may follow an ARIMA(p,d,0p,d,0) model if the ACF and PACF plots of the differenced data show the following patterns:

• the ACF is exponentially decaying or sinusoidal;
• there is a significant spike at lag pp in PACF, but none beyond lag pp.

The data may follow an ARIMA(0,d,q0,d,q) model if the ACF and PACF plots of the differenced data show the following patterns:

• the PACF is exponentially decaying or sinusoidal;
• there is a significant spike at lag qq in ACF, but none beyond lag qq.

In Figure 8.9, we see that there are three spikes in the ACF and then no significant spikes thereafter (apart from one just outside the bounds at lag 14). In the PACF, there are three spikes decreasing with the lag, and then no significant spikes thereafter (apart from one just outside the bounds at lag 8). We can ignore one significant spike in each plot if it is just outside the limits, and not in the first few lags. After all, the probability of a spike being significant by chance is about one in twenty, and we are plotting 21 spikes in each plot. The pattern in the first three spikes is what we would expect from an ARIMA(0,0,3) as the PACF tends to decay exponentially. So in this case, the ACF and PACF lead us to the same model as was obtained using the automatic procedure.

1. arc cos is the inverse cosine function. You should be able to find it on your calculator. It may be labelled acos or cos1−1

# Estimation and order selection

## Maximum likelihood estimation

Once the model order has been identified (i.e., the values of ppdd and qq), we need to estimate the parameters ccϕ1,,ϕpϕ1,…,ϕpθ1,,θqθ1,…,θq. When R estimates the ARIMA model, it uses maximum likelihood estimation (MLE). This technique finds the values of the parameters which maximize the probability of obtaining the data that we have observed. For ARIMA models, MLE is very similar to the least squares estimates that would be obtained by minimizing

t=1Te2t.∑t=1Tet2.

(For the regression models considered in Chapters 4 and 5, MLE gives exactly the same parameter estimates as least squares estimation.) Note that ARIMA models are much more complicated to estimate than regression models, and different software will give slightly different answers as they use different methods of estimation, or different estimation algorithms.

In practice, R will report the value of the log likelihood of the data; that is, the logarithm of the probability of the observed data coming from the estimated model. For given values of ppdd and qq, R will try to maximize the log likelihood when finding parameter estimates.

## Information Criteria

Akaike’s Information Criterion (AIC), which was useful in selecting predictors for regression, is also useful for determining the order of an ARIMA model. It can be written as

AIC=2log(L)+2(p+q+k+1),AIC=−2log⁡(L)+2(p+q+k+1),

where LL is the likelihood of the data, k=1k=1 if c0c≠0 and k=0k=0 if c=0c=0. Note that the last term in parentheses is the number of parameters in the model (including σ2,σ2, the variance of the residuals).

For ARIMA models, the corrected AIC can be written as

AICc=AIC+2(p+q+k+1)(p+q+k+2)Tpqk2.AICc=AIC+2(p+q+k+1)(p+q+k+2)T−p−q−k−2.

and the Bayesian Information Criterion can be written as

BIC=AIC+(log(T)2)(p+q+k+1).BIC=AIC+(log⁡(T)−2)(p+q+k+1).

Good models are obtained by minimizing either the AIC, AICcc or BIC. Our preference is to use the AICcc.