Time series prediction in data science

Introduction

Laymen explanation

Making predictions about the future is called extrapolation in the classical statistical handling of time series data. More modern fields focus on the topic and refer to it as time series forecasting. If you want to know data science approach for prediction, this document helps.

Technical explanation

Anything that is observed sequentially over time is a time series(Refer below example).

Predictor variables are often useful in time series forecasting. For example, suppose we wish to forecast the hourly electricity demand (ED) of a hot region during the summer period. A model with predictor variables might be of the form

ED =f(current temperature, strength of economy, population,time of day, day of week, error)

The “error” term on the right allows for random variation and the effects of relevant variables that are not included in the model.

Model type

Prediction is regression problem where we need to identify curve which can predict over the past time series data.

Time series patterns

Trend
- It depicts long-term increase or decrease in the data(Ref: top right pic and bottom left pic below)
Seasonal
- A seasonal pattern occurs when a time series is affected by seasonal factors such as the time of the year or the day of the week. Seasonality is always of a fixed and known frequency(Ref: top left picture below).
Cyclic
- A cycle occurs when the data exhibit rises and falls that are not of a fixed frequency(Ref: top left picture below)..

Note that bottom right picture above doesn't have any trend.

Stationary time series

A stationary time series is one whose properties do not depend on the time at which the series is observed. Thus, time series with trends, or with seasonality, are not stationary — the trend and seasonality will affect the value of the time series at different times.

The basic steps in a forecasting task

Problem definition
- A forecaster needs to spend time talking to everyone who will be involved in collecting data, maintaining databases, and using the forecasts for future planning.
Gathering information
- Statistical data is needed and experts are needed who collect the data and use the forecasts
Preliminary (exploratory) analysis.
- Always start by graphing the data. Are there consistent patterns? Is there a significant trend? Is seasonality important? Is there evidence of the presence of business cycles? Are there any outliers in the data that need to be explained by those with expert knowledge?
Choosing and fitting models.
- It is common to compare two or three potential models (Ref: below 10 predictions as example).

Using and evaluating a forecasting model
- The performance of the model can only be properly evaluated after the data for the forecast period have become available.

Prediction tools

Linear regression model

Least squared estimation

The least squares principle provides a way of choosing the coefficients effectively by minimising the sum of the squared errors. That is, we choose the values of β0,β1,…,βk that minimise

∑

(

−

⋯

−

)

ARIMA model

This acronym is descriptive, capturing the key aspects of the model itself. Briefly, they are:

AR: Autoregression. A model that uses the dependent relationship between an observation and some number of lagged observations.
I: Integrated. The use of differencing of raw observations (i.e. subtracting an observation from an observation at the previous time step) in order to make the time series stationary.
MA: Moving Average. A model that uses the dependency between an observation and residual errors from a moving average model applied to lagged observations

Reference

https://otexts.com/fpp2/data-methods.html

https://machinelearningmastery.com/gentle-introduction-box-jenkins-method-time-series-forecasting/

https://machinelearningmastery.com/time-series-forecasting/

https://youtu.be/qnEZ1rF0H1Y?t=3691

Page updated

Google Sites

Report abuse