Terminology
Forecast - Estimate the some future event(s)
Time series - Chronological sequence of observations
Stationary series - Series which only shows random pattern. It means it will exhibit same statistical properties for any time-range (mean, median, standard deviation). If a series has external factor, then the series can't be stationary until external factors are removed
Auto-correlation -> Relation of the variable with itself by time shifting values
Classification
Short time forecast - For day, week or month
Medium time forecast - for one or two years range
Long time forecast - involves multiple years into future
Application
Operations management - Forecasting product sales
Marketing - Forecasting of sales response to advertisement
Finance and Risk managment
Process control -
Demography - Forecasting population by country
Methodology
Based on identifying, modelling and extrapolating the patterns found in historical data
Forecasting techniques
Quantitative method
Make use of historical data and forecasting model
Model summarizes pattern in data and establishes statistical relationship between past and current values
model is then used to project the patterns in the data into future
Forecasting model is used to extrapolate the past and current behaviour into the future
Qualitative method -
Properties
Requires judgement of the expert
Subjective in nature
Methods
Delphi method
Set of experts not sitting together will forecast. Their opinion will be combined and refined and finally arrive to the forecast
Quantitative models
Regression model
Make use of one or more related predictor values with the variable of interest
Smoothing model
Uses simple function of previous observations to provide forecast of the variable of interest
Single smoothing model
Double smoothing model
...
General time series model
Employs the statistical properties of the historical data to specify a formal model and then estimate the unknown parameter (generally using least square method)
Dynamic regression - It makes use of past values and also correlation with other variables
Intervention model
Understand that y is not only related to past values. It will also be related to external factors (e.g. govt policies like lockdown)
Formula is c+ (phi1)*(yt-1)+ (phi2)*(yt-2)+..... + (theta1)*(et-1) + (theta2)*(et-2)+........+b1*x1+b2*x2....
Here x1, x2 ... are external factors
Steps
Draw scatter plot for past values
Check if the series is stationary or not
Plot ACF and PACF for what??
Develop model using auto ARIMA
Select model minimizing AIC
Fit the model
X=status on example. Status is external factor
Check model accuracy
MSE check
MAE check
RMSE
Mean absolute percent error
Draw actual vs predicted plot (scatter plot)
to visually see if predicted values are good or not
Dynamic regression
It is extension of intervention model
It makes use of past values and also correlation with other variables
Steps
Preliminary analysis
No need to make series stationary
Even though ACF and PACF doesn't show any info, we will go with auto-Arima
Auto ARIMA is used for choosing best model
Call fit with the best ARIMA model selected by auto-arima
Plot ACF and PACF for normality test?
Check errors
MSE
MAE
Draw scatter plot for actual vs predicted
Drop insignificant features
Select feature which has p-value >0.05
Retrain model with reduced feature set
Time series method
Exponential smoothing
ARIMA
Form of forecast
Point estimte or point forecast
A single value that represents the best estimate of the future value of the variable of interest
They are almost always wrong
Forecast error
Difference between predicted value and actual value
Prediction interval
A range of values for the future observation
Activities in forecasting process
Problem definition
Data collection
Preliminary data analysis
Model selection and fitting
Model validation
Forecast model deployment
Monitoring forecasting model performance
Ongoing activity after the deployment to ensure that the model is still performing satisfactorily
Exploration of time series data
It starts with time series plot.
Reveal patterns
Seasonal pattern
Trend
Level shifts
Cycles or Periods
Unusual observations
Preliminary data analysis
Plot and see
Use scatter plot for this (pyplot.scatter)
Check if series is stationary or not
Many statistical models works on stationary series
Convert series to stationary
Take diff of value with
If trend, then diff with previous
If seasonality, then diff with seasonility range
Keep repeating until we get the stationary series. Most cases, max 2 iteration is needed
Once we get the stationary series, then after go ahead creating the model
Detection of stationary series
By seeing the plot
By statistical test
ADF test
Null hypothesis(H0) : mod(auto-correlation) = 1
A small p-value(p<0.05) suggests that series is stationary
If passes, it means it is not stationary. Otherwise it is stationary
KPSS test
Test whether there is trend or not
Auto correlation
Shift the values by N unit, and then take correlation
Leg 1
Correlate by shifting with 1 unit
Leg 2
Correlate by shifting with 2 unit
Useful packages
Pandas
matplotlib.pyplot
statsmodel.tsa.stattools
statsmodel.??.holtwinters
probplot
plot_acf (acf - Auto Correlation Function), plot_pacf (pacf - Partial Auto correlation Function)
from statsmodel.graphics.tsaplots import plot_acf
statsmodels.tsa.arima_model
auto_arima
from pmdarima import auto_arima
from scipy import stats
Exponential smoothing types
Simple exponential smoothing
Have single parameter Alpha
Used for stationary
Alpha lies between 0 and 1
Alpha is estimated by minimizing the MSE
Give more weight to recent values compared to old values
Formula = Alpha * (previous actual value) + (1-Alpha)*(previous predicted value)
Forecast
Point forecast
Range forecast
Formula - ??
Limitation
We can predict only one value at a time
Steps
Check if the series is stationary
Use model to calculate alpha
Predict by fitting the model
Calculate residue (actual - predicted)
Model error - Good model will have error<0.05 (5 %)
Mean Square Error (MSE)
Mean Absolute Error (MAE)
Root Mean Square Error(R-MSE)
Mean Absolute Percent Error - Mean(Absolute Error/actual value)
<10% is reasonably good
Model adequacy test
Plot probplot over residue and check if it is normal distributed
Model sufficiency
Double exponential smoothing
Have two parameters
Used for trend
Triple exponential smoothing
Have three parameters
Used for seasonal case
ARIMA (Auto regressive Integrated Moving Average)
Auto regressive -> Tells Correlation with how many past values
Moving average -> Tells Correlation with errors
More general ways to develop the forecasting model
Exponential smoothening can be the subclass of ARIMA
Series must be stationary??
Auto regressive meaning How many past values it is dependents on
Terms of ARIMA
How many past values(Phi1, Phi2....) it is dependents on? -> represented by p in ARIMA ..
How many error terms(Theta1, theta2.....) need to be added on? -> represented by q in ARIMA...
d ??
Auto ARIMA
Develop 20-25 models and select the one which gives minimum AIC or BIC (as per criteria)
Model diagnostic check
Errors or residual should be white noise and should not be auto correlated
If p > 0.05, then there is no auto-correlation in residual
Partial correlation between X and Y
Reference