Trang chủ‎ > ‎IT‎ > ‎Data Mining‎ > ‎Time Series Analysis‎ > ‎

Time Series Forecasting Performance Metrics Comparison

Forecasting is the process of making predictions of the future based on past and present data and most commonly by analysis of trends. A commonplace example might be estimation of some variable of interest at some specified future date. Prediction is a similar, but more general term. Both might refer to formal statistical methods employing time seriescross-sectional or longitudinal data, or alternatively to less formal judgmental methods. Usage can differ between areas of application: for example, in hydrology the terms "forecast" and "forecasting" are sometimes reserved for estimates of values at certain specific future times, while the term "prediction" is used for more general estimates, such as the number of times floods will occur over a long period.

Risk and uncertainty are central to forecasting and prediction; it is generally considered good practice to indicate the degree of uncertainty attaching to forecasts. In any case, the data must be up to date in order for the forecast to be as accurate as possible

Forecasting accuracy[edit]

The forecast error (also known as a residual) is the difference between the actual value and the forecast value for the corresponding period.

{\displaystyle \ E_{t}=Y_{t}-F_{t}}\ E_t = Y_t - F_t

where E is the forecast error at period t, Y is the actual value at period t, and F is the forecast for period t.

A good forecasting method will yield residuals that are uncorrelated and have zero mean. If there are correlations between residual values, then there is information left in the residuals which should be used in computing forecasts. If the residuals have a mean other than zero, then the forecasts are biased.

Measures of aggregate error:

Scaled Errors: The forecast error, E, is on the same scale as the data, as such, these accuracy measures are scale-dependent and cannot be used to make comparisons between series on different scales.
Mean absolute error (MAE) or mean absolute deviation (MAD){\displaystyle \ MAE={\frac {\sum _{t=1}^{N}|E_{t}|}{N}}}\ MAE = \frac{\sum_{t=1}^{N} |E_t|}{N}

{\displaystyle \ MAD={\frac {\sum _{t=1}^{N}|E_{t}|}{N}}}\ MAD = \frac{\sum_{t=1}^{N} |E_t|}{N}

Mean squared error (MSE) or mean squared prediction error (MSPE){\displaystyle \ MSE={\frac {\sum _{t=1}^{N}{E_{t}^{2}}}{N}}}\ MSE = \frac{\sum_{t=1}^N {E_t^2}}{N}
Root mean squared error (RMSE){\displaystyle \ RMSE={\sqrt {\frac {\sum _{t=1}^{N}{E_{t}^{2}}}{N}}}}\ RMSE = \sqrt{\frac{\sum_{t=1}^N {E_t^2}}{N}}
Average of Errors (E){\displaystyle \ {\bar {E}}={\frac {\sum _{i=1}^{N}{E_{i}}}{N}}}\ \bar{E}=  \frac{\sum_{i=1}^N {E_i}}{N}
Percentage Errors: These are more frequently used to compare forecast performance between different data sets because they are scale-independent. However, they have the disadvantage of being infinite or undefined if Y is close to or equal to zero.
Mean absolute percentage error (MAPE) or mean absolute percentage deviation (MAPD){\displaystyle \ MAPE=100*{\frac {\sum _{t=1}^{N}|{\frac {E_{t}}{Y_{t}}}|}{N}}}\ MAPE=100*{\frac  {\sum _{{t=1}}^{N}|{\frac  {E_{t}}{Y_{t}}}|}{N}}

{\displaystyle \ MAPD={\frac {\sum _{t=1}^{N}|E_{t}|}{\sum _{t=1}^{N}|Y_{t}|}}}{\displaystyle \ MAPD={\frac {\sum _{t=1}^{N}|E_{t}|}{\sum _{t=1}^{N}|Y_{t}|}}}

Scaled Errors: Hyndman and Koehler (2006) proposed using scaled errors as an alternative to percentage errors.
Mean absolute scaled error (MASE){\displaystyle MASE={\frac {\sum _{t=1}^{N}|{\frac {E_{t}}{{\frac {1}{N-m}}\sum _{t=m+1}^{N}|Y_{t}-Y_{t-m}|}}|}{N}}}{\displaystyle MASE={\frac {\sum _{t=1}^{N}|{\frac {E_{t}}{{\frac {1}{N-m}}\sum _{t=m+1}^{N}|Y_{t}-Y_{t-m}|}}|}{N}}}

*{\displaystyle m=seasonalperiod}{\displaystyle m=seasonalperiod} or 1 if non-seasonal

Other Measures:
Forecast skill (SS){\displaystyle \ SS=1-{\frac {MSE_{forecast}}{MSE_{ref}}}}\ SS = 1- \frac{MSE_{forecast}}{MSE_{ref}}

Business forecasters and practitioners sometimes use different terminology in the industry. They refer to the PMAD as the MAPE, although they compute this as a volume weighted MAPE.[10] For more information see Calculating demand forecast accuracy.

When comparing the accuracy of different forecasting methods on a specific data set, the measures of aggregate error are compared with each other and the method that yields the lowest error is preferred.

Training and test sets

It is important to evaluate forecast accuracy using genuine forecasts. That is, it is invalid to look at how well a model fits the historical data; the accuracy of forecasts can only be determined by considering how well a model performs on new data that were not used when fitting the model. When choosing models, it is common to use a portion of the available data for fitting, and use the rest of the data for testing the model, as was done in the above examples.[11]

Cross Validation

A more sophisticated version of training/test set.

for cross sectional data, cross-validation works as follows:

  1. Select observation i for the test set, and use the remaining observations in the training set. Compute the error on the test observation.
  2. Repeat the above step for i = 1,2,..., N where N is the total number of observations.
  3. Compute the forecast accuracy measures based on the errors obtained.

This is a much more efficient use of the available data, as you only omit one observation at each step

for time series data, the training set can only include observations prior to the test set. therefore no future observations can be used in constructing the forecast. Suppose kobservations are needed to produce a reliable forecast then the process works as:

  1. Select the observation k + i for test set, and use the observations at times 1, 2, ..., k+i-1 to estimate the forecasting model. Compute the error on the forecast for k+i.
  2. Repeat the above step for i = 1,2,...,T-k where T is the total number of observations.
  3. Compute the forecast accuracy over all errors

This procedure is sometimes known as a "rolling forecasting origin" because the "origin" (k+i -1) at which the forecast is based rolls forward in time[12]

Limitations of Errors

The two most popular measures of accuracy that incorporate the forecast error are the Mean Absolute Error (MAE) and the Root Mean Squared Error (RMSE). Thus these measures are considered to be scale-dependent, that is, they are on the same scale as the original data. Consequently, these cannot be used to compare models of differing scales.

Percentage errors are simply forecast errors converted into percentages and are given by {\displaystyle P_{t}=100E_{t}/Y_{t}}P_{t}=100E_{t}/Y_{t}. A common accuracy measure that utilizes this is the Mean Absolute Percentage Error (MAPE). This allows for comparison between data on different scales. However, percentage errors are not quite meaningful when {\displaystyle Y_{t}}Y_{t} is close to or equal to zero, which results in extreme values or simply being undefined.[13] Scaled errors are a helpful alternative to percentage errors when comparing between different scales. They do not have the shortfall of giving unhelpful values if {\displaystyle Y_{t}}Y_{t} is close to or equal to zero.

Mean absolute percentage error

The mean absolute percentage error (MAPE), also known as mean absolute percentage deviation (MAPD), is a measure of prediction accuracy of a forecasting method in statistics, for example in trend estimation. It usually expresses accuracy as a percentage, and is defined by the formula:

{\displaystyle {\mbox{M}}={\frac {100}{n}}\sum _{t=1}^{n}\left|{\frac {A_{t}-F_{t}}{A_{t}}}\right|,}{\displaystyle {\mbox{M}}={\frac {100}{n}}\sum _{t=1}^{n}\left|{\frac {A_{t}-F_{t}}{A_{t}}}\right|,}

where At is the actual value and Ft is the forecast value.

The difference between At and Ft is divided by the Actual value At again. The absolute value in this calculation is summed for every forecasted point in time and divided by the number of fitted points n. Multiplying by 100 makes it a percentage error.

Although the concept of MAPE sounds very simple and convincing, it has major drawbacks in practical application [1]

  • It cannot be used if there are zero values (which sometimes happens for example in demand data) because there would be a division by zero.
  • For forecasts which are too low the percentage error cannot exceed 100%, but for forecasts which are too high there is no upper limit to the percentage error.
  • When MAPE is used to compare the accuracy of prediction methods it is biased in that it will systematically select a method whose forecasts are too low. This little-known but serious issue can be overcome by using an accuracy measure based on the ratio of the predicted to actual value (called the Accuracy Ratio), this approach leads to superior statistical properties and leads to predictions which can be interpreted in terms of the geometric mean.

Mean absolute scaled error

statistics, the mean absolute scaled error (MASE) is a measure of the accuracy of forecasts . It was proposed in 2005 by statistician Rob J. Hyndman and Professor of Decision Sciences Anne B. Koehler, who described it as a "generally applicable measurement of forecast accuracy without the problems seen in the other measurements."[1] The mean absolute scaled error has favorable properties when compared to other methods for calculating forecast errors, such as root-mean-square-deviation, and is therefore recommended for determining comparative accuracy of forecasts.


The mean absolute scaled error has the following desirable properties:[3]

  1. Scale invariance: The mean absolute scaled error is independent of the scale of the data, so can be used to compare forecasts across data sets with different scales.
  2. Predictable behavior as {\displaystyle y_{t}\rightarrow 0}{\displaystyle y_{t}\rightarrow 0} : Percentage forecast accuracy measures such as the Mean absolute percentage error (MAPE) rely on division of {\displaystyle y_{t}}y_{t}, skewing the distribution of the MAPE for values of {\displaystyle y_{t}}y_{t} near or equal to 0. This is especially problematic for data sets whose scales do not have a meaningful 0, such as temperature in Celsius or Fahrenheit, and for intermittent demand data sets, where {\displaystyle y_{t}=0}{\displaystyle y_{t}=0} occurs frequently.
  3. Symmetry: The mean absolute scaled error penalizes positive and negative forecast errors equally, and penalizes errors in large forecasts and small forecasts equally. In contrast, the MAPE and median absolute percentage error (MdAPE) fail both of these criteria, while the "symmetric" sMAPE and sMdAPE[4] fail the second criterion.
  4. Interpretability: The mean absolute scaled error can be easily interpreted, as values greater than one indicate that in-sample one-step forecasts from the naïve method perform better than the forecast values under consideration.
  5. Asymptotic normality of the MASE: The Diebold-Mariano test for one-step forecasts is used to test the statistical significance of the difference between two sets of forecasts. To perform hypothesis testing with the Diebold-Mariano test statistic, it is desirable for {\displaystyle DM\sim N(0,1)}{\displaystyle DM\sim N(0,1)}, where {\displaystyle DM}DM is the value of the test statistic. The DM statistic for the MASE has been empirically shown to approximate this distribution, while the mean relative absolute error (MRAE), MAPE and sMAPE do not.[2]

Non seasonal time series[edit]

For a non-seasonal time series,[5] the mean absolute scaled error is estimated by

{\displaystyle \mathrm {MASE} ={\frac {1}{T}}\sum _{t=1}^{T}\left({\frac {\left|e_{t}\right|}{{\frac {1}{T-1}}\sum _{t=2}^{T}\left|Y_{t}-Y_{t-1}\right|}}\right)={\frac {\sum _{t=1}^{T}\left|e_{t}\right|}{{\frac {T}{T-1}}\sum _{t=2}^{T}\left|Y_{t}-Y_{t-1}\right|}}}{\displaystyle \mathrm {MASE} ={\frac {1}{T}}\sum _{t=1}^{T}\left({\frac {\left|e_{t}\right|}{{\frac {1}{T-1}}\sum _{t=2}^{T}\left|Y_{t}-Y_{t-1}\right|}}\right)={\frac {\sum _{t=1}^{T}\left|e_{t}\right|}{{\frac {T}{T-1}}\sum _{t=2}^{T}\left|Y_{t}-Y_{t-1}\right|}}}[3]

where the numerator et is the forecast error for a given period, defined as the actual value (Yt) minus the forecast value (Ft) for that period: et = Yt − Ft, and the denominator is the mean absolute error of the one-step "naive forecast method" on the training set,[5] which uses the actual value from the prior period as the forecast: Ft = Yt−1[6]

Seasonal time series[edit]

For a seasonal time series, the mean absolute scaled error is estimated in a manner similar to the method for non-seasonal time series:

{\displaystyle \mathrm {MASE} ={\frac {1}{T}}\sum _{t=1}^{T}\left({\frac {\left|e_{t}\right|}{{\frac {1}{T-m}}\sum _{t=m+1}^{T}\left|Y_{t}-Y_{t-m}\right|}}\right)={\frac {\sum _{t=1}^{T}\left|e_{t}\right|}{{\frac {T}{T-m}}\sum _{t=m+1}^{T}\left|Y_{t}-Y_{t-m}\right|}}}{\displaystyle \mathrm {MASE} ={\frac {1}{T}}\sum _{t=1}^{T}\left({\frac {\left|e_{t}\right|}{{\frac {1}{T-m}}\sum _{t=m+1}^{T}\left|Y_{t}-Y_{t-m}\right|}}\right)={\frac {\sum _{t=1}^{T}\left|e_{t}\right|}{{\frac {T}{T-m}}\sum _{t=m+1}^{T}\left|Y_{t}-Y_{t-m}\right|}}}[5]

The main difference with the method for non-seasonal time series, is that the denominator is the mean absolute error of the one-step "seasonal naive forecast method" on the training set,[5] which uses the actual value from the prior season as the forecast: Ft = Yt−m,[6] where m is the seasonal period.

This scale-free error metric "can be used to compare forecast methods on a single series and also to compare forecast accuracy between series. This metric is well suited to intermittent-demand series[clarification needed] because it never gives infinite or undefined values[1] except in the irrelevant case where all historical data are equal.[3]

When comparing forecasting methods, the method with the lowest MASE is the preferred method.