Well, in this project we implemented time series analysis to predict busy traffic days at the San Diego Airport.
Let us walk you through how it’s done!
1. We first started making predictions using the naive approach, which considers the last value of the data set as the future value and the results were awful, as you can see below.
Note: All the plots on this page show the daily ride shares count at the SAN airport. The blue part is the training set, orange is the validation or test set and green is our prediction.
2. Next, instead of taking just the last value, we averaged the last 10, 20, and 50 values. The forecast was just a bit better!
3. To improve things further, we decided to assign weights to the records, with higher weights to more recent records and lower weights to older records. We saw a bit more progress!
4. Now, we somewhat understood the concept of forecasting and wanted to involve more statistics. We implemented Holt’s linear model which considers the trend of time series data to make predictions.
5. We then realized that there was seasonality in the data, rather then a trend, so we implemented another stats model called Holt Winter’s model. And BOOM! It showed some silver lining!
6. Now that we knew that we needed to leverage the fact of seasonality of our data in these statistical models, we thought of trying the ultimate statistical time series forecasting model called SARIMA. We had to play a lot with the parameters, but the results were great, as you could see below.
7. Finally, we tried to use this model to predict the traffic for the month of December 2018 on daily basis and below is the result!
2018-12-06 5982.870330
2018-12-07 6330.885789
2018-12-08 4261.815503
2018-12-09 6067.130448
2018-12-10 6727.546369
2018-12-11 5482.926322
2018-12-12 5277.611995
2018-12-13 5998.370081
2018-12-14 6346.385540
2018-12-15 4277.315255
2018-12-16 6082.630199
2018-12-17 6743.046121
2018-12-18 5498.426074
2018-12-19 5293.111747
2018-12-20 6013.869833
2018-12-21 6361.885292
2018-12-22 4292.815006
2018-12-23 6098.129950
2018-12-24 6758.545872
2018-12-25 5513.925825
2018-12-26 5308.611498
2018-12-27 6029.369584
2018-12-28 6377.385043
2018-12-29 4308.314757
2018-12-30 6113.629702
Note: The reason why the model above starts at December 5th and ends on December 30th is because we ran the model on the 5th.
If you want to learn more about how this was done, then check out the code on GitHub.