Seasonal Adjustment

Seasonality

Seasonality is not particular to national accounts only but can apply to any time series. A time series is a set of observations collected at regular, consecutive time intervals. For our purpose these time intervals are quarters, but these can be days, months or years as well. Also, observations can pertain to stocks at a particular moment in time and flows during the whole time period. The consecutive observations in a time series are usually not independent, and the aim of time series analysis is to extract as much information as possible on the interdependencies from the series. The correlation between observations that we are interested in here is caused by seasonal effects. Seasonal movements can make features difficult or impossible to see in time series. Hence, next to the original series national accountants need to remove the seasonal effects and publish seasonally adjusted series as well, which can be more easily interpreted without a significant loss of information. Since seasonal effects are annual effects, the data must be collected at a frequency less than annually, usually monthly or quarterly.

Calendar-related seasonal events are linked to particular periods in the calendar year. During winter the demand for fuel for heating will be higher and during summer holidays the level of production in some industries in which a significant number of businesses close down will be lower. Administrative arrangements can also have a calendar-related impact. For example, income tax refunds may occur in a particular quarter each year. Such seasonal effects are usually reasonably stable in terms of their timing, direction and scale and so the effects can be estimated and removed when the original series is being seasonally adjusted.

We can distinguish between the following types of calendar-related effects:

  • Trading day effect

  • Leap year effect

  • Effect of moving holidays

The Trading-day effect is the impact on a time series of the number of particular types of days in a quarter. As can be seen from the table below, a calendar quarter comprises either 90, 91 or 92 days. The level of output or expenditure in a quarter may be influenced by the different numbers of days. For example, if the level of activity were the same each day then activity in the third and fourth quarters of the year would be 2/90 or about 2.2% higher than in the first quarter (except in leap years when the first quarter has an additional day).

If a particular activity is confined to weekdays rather than weekend days then the trading day effect will vary from quarter to quarter and for the same quarter from one year to the next. As we see from the following table, if we take the working week from Monday to Friday, the first quarter usually has 64 working days (2013, 2014 and 2015), but it can have 65 working days as well (2007). In the leap year 2012 it also has 65 working days. This increase in working days in 2012 is an example of the leap year effect.

When an activity is only undertaken on weekdays, the different number of weekdays will affect the level of that activity. For example, building permit offices are usually closed on Saturday and Sunday. Thus, the number of building permits issued in a given month or quarter is likely to be higher if the month contains a surplus of weekdays and lower if the month contains a surplus of weekend days.

Moving holidays are annual occurrences but their exact timing can shift from year to year. Some moveable holidays always fall in the same quarter each year, such as the Chinese New Year which varies from late January to about three weeks into February. Other festivals such as Easter and Ramadan may fall in different quarters. Easter usually occurs in the second quarter but falls in the first quarter once every few years. Ramadan changes over time and was in the fourth quarter in 2005, occupied part of the third and fourth quarters in 2006 and 2007, was entirely in the third quarter from 2008 to 2013 and will be occupying part of the second and third quarters in 2014 and 2015.

What is Seasonal Adjustment?

Seasonal Adjustment consists of the removal of seasonal effects from the original time series. Some adjustments such as trading day adjustments can be made without analyzing the series, simply by knowing whether there are differences in the number of trading days between quarters. In most cases we do not know the extent of the seasonal effects prior to analysis. Hence the challenge is to extract information on the seasonal pattern from the time series.

Statistical offices typically have dedicated software to carry out seasonal adjustment. For this reason NA Builder does not have any facitities for this. However, because seasonal adjustment is an important part of QNA there could be relevant design issues when building a QNA framework in NA Builder. We will therefore give some background on seasonal adjustment below.

Time series consist of the following components:

  • Trend T(t) is a persistent upward or downward movement of the data over a long period of time

  • The seasonal variation Sn(t) refers to the pattern of change in the data that completes itself within a calendar year and then is repeated on a yearly basis

  • The cycle Cl(t) is the upward or downward change in the data that occurs over a duration of 2 to 10 years or longer

  • The irregular I(t) fluctuations are the erratic movements in a time series that have no definable pattern; this component is sometimes called white noise

The index “t” in the terms T(t), Sn(t), Cl(t) and I(t) indicates that these components are observed for each period t, which are quarters in our case.

The most important component of a time series is the trend, which is the long term behavior of the time series, i.e. without the interfering effects of seasonal and cyclical movements and without the irregular fluctuations.

How can we determine whether there is a trend? One method involves the use of the so called Pearson's correlation coefficient:

Here the ∑ symbol indicates that the argument following it needs to be summed over t, so ∑ Y = ∑ Y(t) = Y(1) + .. + Y(n) with n the number of observations on the variable Y. We can use this coefficient as follows:

  • No correlation: r=0, no trend

  • Positive correlation: positive trend

  • Negative correlation: negative trend

To test whether the calculated coefficient is significantly different from zero, i.e. whether it is safe to assume that a trend is present, we can use the following test of significance (with n again being the number of observations and r the correlation coefficient):

This “test statistic” tp is a single number. The critical value for this statistic is 2.306. So if the absolute value of the calculated value exceeds this threshold, we can assume a trend.

One of the most important methods to estimate a trend is supplied by regression analysis. Let us introduce the following regression model:

Y(t) = α + β x(t) + ε(t)

Here α and β are unknown parameters and ε(t) is the random error term. Regression analysis gives us formulas to calculate estimates for α and β which we will call a and b respectively. We will give the calculations by way of the following example (based on Introduction to time-series modeling and forecasting in Business and Economics, Gaynor & Kirkpatrick, 1994).

There are 19 time periods. The row “sum” in the above table gives the totals for the various columns and similarly for the average (denoted with a horizontal bar on top of the symbol). Given these values we can calculate a and b as follows (the average of t = 1..19 is 10):

We can then calculate the trend with the following equation (the caret or “hat” symbol on top of Y indicates that we have an estimate of Y, only including the trend in this model):

Ŷ(t) = -125.916 + 7.148 . t

The following graph shows the actual data and the calculated trend.

We saw earlier that, besides the trend, time series data can also contain seasonal (Sn) and cyclical components (Cl). The difference between the latter two components is that seasonal variations repeat themselves with a frequency less than a calendar year, whereas cycles are upward or downward movements that occur over a period of more than a year. We will ignore the cyclical component from now on and assume it to be part of the trend. So far we have seen how to model the trend. How do we model the seasonal component Sn? Earlier we introduced the idea of time series decomposition into trend, cycle, season and irregular component. Such decomposition can be done additively and multiplicatively. Additive models are specified as

Y(t) = T(t) + Sn(t) + Cl(t) + I(t)

Multiplicative models are specified as

Y(t) = T(t) x Sn(t) x Cl(t) x I(t)

Here, we will concentrate on additive models. Techniques for multiplicative models are very similar.

Given the fact that seasonal effects repeat themselves after four quarters it would seem to make sense to “average” them out. We will present this technique by way of an example. In the following table quarterly observations on Y are given. In the column "MA" a 4 period moving average is calculated. The calculations for the yellow row 5 are given in the last row of the table using the row numbers and column letters indicated in the table. Two consecutive averages are again averaged to obtain the centered moving average CMA, which is an estimate of the trend + cycle. The residual (= observation – trend – cycle) in column F is made up by the seasonal component and the error (Sn + e). On the basis of these estimates the seasonal components are estimated in column G, averaging similar quarters of consecutive years. The average value of these seasonal components should be zero. As is shown in row 19, the average value here is 0.3, which is therefore subtracted from the seasonal components in column H. The de-seasonalized series d in column I is obtained by subtracting the seasonal component from Y.

Using the above results we obtain the following plot of the original and seasonally adjusted data.

The seasonally adjusted time series is therefore a combination of the trend (and cycle) and the irregular component.

What to do with Outliers?

Removing the trend and seasonal influences from a series leaves the irregular component. As the name implies, irregular events do not have a systematic seasonal influence underlying them. Examples of such events are strikes, unseasonal weather conditions and natural disasters. In a time series, irregular events result in short-term fluctuations that are not systematic (i.e. not captured in the trend and seasonal components). By its nature this irregular component is random and can sometimes be quite large. When the deviation of the irregular component from the trend is exceptionally large, it is referred to as an outlier. Outliers can make it difficult to measure the trend and seasonal components. Identifying outliers involves identifying those terms in the time series that deviate from the mean by more than a specified amount, such as two standard deviations. It is important to remember that outliers present in the data need to be included in the seasonally adjusted series.

Three types of outliers are commonly recognized:

  • Additive Outlier (AO): the value of only one observation is affected. AO may either be caused by random effects or due to an identifiable cause as a strike or bad weather

  • Temporary Change (TC): the value of one observation is extremely high or low, then the size of the deviation reduces gradually (exponentially) in the course of the subsequent observations until the time series returns to the initial level

  • Level Shift (LS): starting from a given time period, the level of the time series undergoes a permanent change. This may be caused by: changes in concepts and definitions of the survey population, changes in the data collection method, changes in legislation, etc.

These different types of outliers are illustrated in the following graph.

The X11 method

The earlier example is illustrative of the use of filter based methods for seasonal adjustment. A filter is a weighted moving average where the weights sum to 1. Seasonal filters are the filters used to estimate the seasonal component. We will now examine a filter-based approach called X11 which was developed by the US Bureau of the Census and began operation in the United States in 1965. Ideally, seasonal filters are computed using values from the same quarter, as we did in column G in above example. The seasonal filters available in X11 consist of seasonal moving averages of consecutive values within a given month or quarter. An n x m moving average is an m-term simple average taken over n consecutive sequential time spans. A 3 x 3 moving average of Y(t) is:

The X11 procedure goes as follows:

  1. A first estimate of the trend component T(t) is made by computing ratios between the original series Y(t) and a centered 2 x 12 moving average of the original series. By subtracting this trend estimate from the observations we have a first estimate of the seasonal and irregular components Sn(t) and I(t)

  2. A preliminary estimate of the seasonal component is then found by applying a 3 x 3 moving average filter to the Sn(t) and I(t) from step 1 (the functional form depends on the choice of model: Sn(t) + I(t) for the additive model, Sn(t) . I(t) for the multiplicative model)

  3. A preliminary seasonally adjusted series is found by dividing the original series by the estimate of the seasonal component from the previous step

  4. A so called Henderson moving average is applied to the seasonally adjusted values to give an improved estimate of the trend. The resulting trend cycle series is divided into the original series to give a second estimate of the seasonal and irregular components

  5. A final estimate of the seasonal component is then found by applying a weighted 3 x 5 moving average filter to the Sn(t) and I(t) from the previous step

  6. A final seasonally adjusted series is found by dividing the second estimate of the seasonal from the previous step into the original series

  7. A 9, 13 or 23 term Henderson moving average is applied to the seasonally adjusted series, which has been modified for outliers. This gives an improved estimate of the trend

The irregular component can then be estimated by dividing the trend estimates into the seasonally adjusted data.

ARIMA models

Earlier we introduced the idea of time series decomposition into trend (T), cycle (Cl), season (Sn) and irregular component (I). We specified the additive models of such a decomposition as

Y(t) = T(t) + Sn(t) + Cl(t) + I(t)

and the multiplicative model as

Y(t) = T(t) . Sn(t) . Cl(t) . I(t)

We then modelled the trend using the regression model

Y(t) = α + β . t + ε(t)

Here α and β are unknown parameters and εt is the random error term. The error terms are generally assumed to be independent identically distributed random variables, sampled from a normal distribution with zero mean. These assumptions may be weakened but doing so will change the properties of the model.

So far it has been assumed that the time series observations Y(t) have been statistically independent, or, in other words, that the errors are random. If this is not true, then we should use in our models past values of the variable Y and / or past values of the error terms. This generates a class of models that are called ARIMA models.

If we include p past observations Y(t-1), Y(t-2), … Y(t-p) in the model we obtain an Autoregressive Model of order p: AR(p)

Here δand the φ’s are unknown parameters to be estimated and εt is again the random error term. An AR model expresses a time series as a linear function of its past values. The order of the AR model indicates how many past values are included. Let us take GDP as example and use quarterly time series. An AR(1) model where GDP depends on the quarter before is given by:

Here δ and φ are parameters to be estimated. In this form we see that this model is analogous to the regression model we examined earlier. The name “autoregressive” refers to the regression on self (auto).

If we include q past error terms in a regression model we get a Moving Average model of order q: MA(q)

Here μ and the θ’s are unknown parameters to be estimated. The time series is now regarded as a moving average (unevenly weighted) of a random series ε. An example of a MA(1) model is

Here we include one past error term.

We have seen that the autoregressive model includes lagged terms on the time series itself, and that the moving average model includes lagged terms on the errors. By including both types of lagged terms, we arrive at what are called autoregressive-moving-average, or ARMA, models. An ARMA(p,q) model is

An example of an ARMA(1,1) model is

Using levels of GDP in the model may result in non-stationary behavior. To make the series stationary, so that they fluctuate around a constant mean, we can difference the original data. We can take first differences as follows

Z(t) = ΔY(t) = Y(t) - Y(t-1)

The ARMA model for first differences Z is called an ARIMA(p,1,q) model

An alternative notation uses the backward shift operator B which moves back the time with one unit.

Z(t) = Y(t) - Y(t-1) = (1 – B) Y(t)

This is especially useful if we want to include seasonal differencing later as well. For quarterly data we can use

(1 – B**4) Y(t)

An ARIMA(0,1,0) model is called a random walk.

Y(t) - Y(t-1) = µ

An ARIMA(0,1,1) model without constant reflects simple exponential smoothing

Adding the constant changes this into exponential smoothing with growth.

Generalizing the above using differencing for any number of periods results in ARIMA(p,d,q) models which use d-period differencing, with d = the number of non-seasonal differences (we will add the seasonal differencing later).

If the model is specified in terms of differences Z, the model must be integrated to obtain the model in terms of Y. For example, suppose we identify and estimate the following model

Z(t) = 1.5647 - 0.4887.Z(t-1) - 0.4763.Z(t-2)

Substituting for Z and solving for Y, we get

Y(t) - Y(t-1) = 1.5647 - 0.4887.[Y(t-1) - Y(t-2)] - 0.4763. [Y(t-2) - Y(t-3)]

or, after collecting terms,

Y(t) = 1.5647 + 0.5113.Y(t-1) + 0.0124.Y(t-2) + 0.4763.Y(t-3)

Seasonality in ARIMA models

When we specify ARIMA models for seasonal data we may apply not only regular differencing as in the models above (usually first or second differences) but also seasonal differencing (with period L) to come to a stationary series that can be modeled (such seasonal differencing removes the seasonal effect). We can achieve this by transforming to the following variable:

Z(t) = Y(t) - Y(t-L)

We can then also take first differences of the seasonal differences

[Y(t) - Y(t-L)] – [Y(t-1) - Y(t-L-1)]

A simple example is given by the following MA model, with a lag of 12 months (assuming monthly time series)

Z(t)=-0.6824.e(t-12)

We must then again substitute for Z and solve for Y

[Y(t) - Y(t-1)] - [Y(t-12) - Y(t-13)] = -0.6824.e(t-12)

which will yield the equation

Y(t) = Y(t-1) + Y(t-12) - Y(t-13) - 0.6824.e(t-12)

We can generalize this by introducing D as the seasonal order of differences and then allowing for a seasonal ARIMA model with parameters P for the number of AR terms and Q for the number of MA terms: ARIMA(P,D,Q)

Combining the ARIMA models for the regularly differenced time series and for the seasonally differenced time series we obtain a so called SARIMA (p,d,q)(P,D,Q) model (seasonal ARIMA): SARIMA(p,d,q)(P,D,Q)

When there is no seasonal effect, a SARIMA model reduces to pure ARIMA (p,d,q) and when the time series dataset is stationary a pure ARIMA reduces to ARMA(p,q). Often the letter “S” from SARIMA is dropped.

Let us look at the example of an ARIMA(1,1,0)(0,1,0) model for GDP. We want an AR(1) model as before but now with two lag structures, one introducing first differences in the main model, and the other introducing a seasonal lag of 4, assuming quarterly data now. Using the B operator introduced earlier (instead of transforming to the Z variable) we can denote these lags as a combination:

This term is then substituted into the AR(1) model

An example of an ARIMA(0,0,1)(0,1,0) model for GDP is

One of the most widely used models is the so called AIRLINE model, which is ARIMA(0,1,1)(0,1,1). For quarterly GDP we get

Reg-ARIMA models

A Reg-ARIMA model is a regression model with ARIMA errors. The regression model is used to capture effects that we encountered in the first part of this paper, such as outliers, trading days, and moving holidays.

The general expression for a Reg-ARIMA model is

Here D(t) are prior adjustments of the observed variable Y, such as leap-year adjustment. X(t) is a vector of regressors for trading day, holiday or calendar effects, and for outliers. The regressor parameters to be estimated are collected in the transposed vector β. Z(t) is assumed to follow an ARIMA process.

The Easter effect discussed in part 1 of this paper may be introduced as follows using “dummy” variables

X(t) = 1 if t>t0 and t< t0 + L

X(t) = 0 otherwise

Here t0 indicates the beginning of the Easter period and L the length of the period.

A level shift outlier at t = t(0) can be introduced as

X(t) = 1 if t ≥ t0

X(t) = 0 if t < t0

The Log (Logarithmic) transformation is applied assuming the model is multiplicative, thereby converting it into an additive model.

The Reg-ARIMA model is used in the TRAMO/SEATS procedure for seasonal adjustment, which we will explore below.

Specifying ARIMA models

ARIMA modeling proceeds by a series of well-defined steps. The first step is to identify the model. The procedure of identifying the correct model (stationary or differenced data; which orders p and q) is known as the Box-Jenkins methodology. Important tools in identifying the correct model are the following two statistics.

  • Autocorrelation coefficient (acf) of order k, indicating how the observations correlate with observations k periods earlier

  • Partial autocorrelation coefficient (pacf) of order k, indicating how the observations correlate with observations k periods earlier, where the effect of the intervening time periods has been removed

Here r(k) is the autocorrelation coefficient for k lags apart, and r(kj) is the partial autocorrelation coefficient for k lags apart, when the effect of j intervening lags has been removed; this is calculated as

By definition, r(11) = r(1).

For example

For both acf and pacf we can test for significance, using appropriate t-statistics (not given here).

Given a time-series we can calculate the acf and pacf for a number of lags and test for significance. We can then obtain the following situations:

  • The statistic "cuts off" after a certain lag, i.e. the coefficients become abruptly much smaller

  • The statistic "dies down", i.e. the coefficients gradually become smaller and smaller

Very roughly we can say the following:

  • If the acf cuts off after lag q and the pacf dies down then we have a MA(q) model

  • If the acf dies down and the pacf cuts off after lag p then we have a AR(p) model

  • If both acf and pacf die down we have an ARMA model (or after using differencing d times a ARIMA(.,d,.) model)

The second step is to estimate the coefficients of the model. Coefficients of AR models can be estimated by least-squares regression. Estimation of parameters of MA and ARMA models usually requires a more complicated iteration procedure. In practice, estimation is fairly transparent to the user, as it accomplished automatically by a computer program with little or no user interaction.

The third step is to check the model. This step is also called diagnostic checking, or verification. Two important elements of checking are to ensure that the residuals of the model are random, and to ensure that the estimated parameters are statistically significant. Usually the fitting process is guided by the principal of parsimony, by which the best model is the simplest possible model – the model with the fewest parameters -- that adequately describes the data.

TRAMO/SEATS

TRAMO/SEATS is a seasonal adjustment program developed by Agustin Maravall and Victor Gomez at the Bank of Spain. TRAMO (Time series Regression with ARIMA noise, Missing observations, and Outliers) and SEATS (Signal Extraction in ARIMA Time Series) are linked programs. TRAMO provides automatic ARIMA modeling, while SEATS computes the components for seasonal adjustment. SEATS uses filters derived from an ARIMA-type time series model that describes the behavior of the series to tailor seasonal and trend filters to the series.

TRAMO estimates a Reg-ARIMA model. Regressors in this model include:

  • Dummy variables for additive outliers (AO), level shift (LS) and temporary changes (TC)

  • The number of working-days , #(Mondays + Tuesdays + … + Fridays) - #Saturdays - #Sundays

  • #Mondays - #Sundays, … , #Saturdays - #Sundays

  • The number of days in a quarter/month

  • Easter effect

TRAMO produces also a set of diagnostics in order to check if the ARIMA-model is proper. If no ARIMA–model is specified, TRAMO has an algorithm, which searches for a suitable ARIMA model.

The residual series (i.e. the part of the observed variable not explained by the TRAMO regression model) is input to SEATS, where the decomposition into trend, cycle, season and irregular component is made.

TRAMO/SEATS is used by many European national statistical offices.

Further development of X11

Recall that earlier we explored the filter based X11 method. One limitations of this procedure is the low quality of the asymmetric filters at the ends of the time series. Also the choice of filters is rather limited.

The X11 method was extended to X11-ARIMA at Statistics Canada. ARIMA modeling of the original series is used to extend the series at both ends by forecasting and backcasting using the estimated model. The extended series is then seasonally adjusted with X11.

Limitations of X11-ARIMA are that there are no user-defined regressors for special situations and that the ARIMA modeling is not robust against outliers (including level shifts).

Subsequently, the US Bureau of the Census developed X12-ARIMA which included Reg-ARIMA modelling, allowing for detecting and adjusting for outliers and other distorting effects to improve the forecasts and for detecting and estimating additional components such as calendar effects. Also, a wide variety of seasonal and trend filter options have been included. This is the current method for statistical agencies in United States, UK, Canada, New Zealand, Japan, and other countries.

An advantage of the European SEATS is that the seasonal adjustment filter is determined by a model, and not by a finite set of moving average filters such as in X12. For this reason a new procedure has been developed by the US Census Bureau and the developers of SEATS called X-13ARIMA-SEATS (X-13A-S) which combines X-12-ARIMA and SEATS.