Goal
The project focuses on improving and enhancing Covid-19 trends prediction by using exogenous/regressor variables like Vaccination Rates, Stringency Index, Mobility Data, Stringency Data, GDP, Human Development Index, and Diabetes Prevalence. The models used are: SARIMA, Facebook Prophet, TBATS, RNN, LSTM, and SI.
Data
The COVID-19 Data by Our World in Data dataset that has been collected from the WHO COVID dashboard. It contains data for the United States, where 1674 is the number of days and 68 is the number of different features. This dataset was reduced to the 37 most important features. The COVID-19 Twitter Social Mobility Data dataset that contains the current index and longitudinal mobility data for several cities, and all the states in the United States was also utilized in this project. Joining this time-series data with the time-series COVID data, we obtained a dataset of 1097 days and 38 features. The first date is January 5, 2020 and the last date is 31st December 2022.
Seasonal ARIMA (Auto-Regressive Integrated Moving Average)
When run in isolation, the order of hierarchy of the 6 chosen exogenous variables is as follows:
New Vaccinations > Human Development Index > Diabetes Prevalence > GDP Per Capita > Mobility Data > Stringency Index
Predictions have been made with different sets of exogenous variables used together. Below is an example of 4 sets of exogenous variables used with the SARIMA models to make predictions.
Facebook Prophet
When run in isolation, the order of hierarchy of the 6 chosen exogenous variables is as follows:
GDP Per Capita > Diabetes Prevalence > Human Development Index > New Vaccinations > Stringency Index > Mobility Data
Predictions have been made with different sets of exogenous variables used together. Below is an example of 4 sets of exogenous variables used with the Facebook Prophet models to make predictions.
TBATS
TBATS (Trigonometric, Box-Cox, ARMA, Trend, and Seasonal) is also a Time series forecasting model that can handle complex time series data that has diverse seasonality. Below is the graph for the prediction made by the TBATS model for the last 7 weeks of the data as compared with the ground truth data.
SI Model
The plots show the susceptible (green) and the infected (red) population proportions vs weeks in 2021. In the plot on the left, we see the vaccination population proportion in blue. Initially, it is visible that with increasing vaccinations the infected population reduces. Then, an increase can be witnessed, which was due to a new variant, followed by another local minima with increasing vaccinations.
LSTM and RNN Model
Four Versions have been implemented:
Single-Layer RNN Model
Two-Layer RNN Model
Single-Layer LSTM Model
Two-Layer LSTM Model
The RNN_2_Layer model outperformed the others, achieving the lowest RMSE (95,198.32) and competitive MAE (36,336.57). The LSTM_1_Layer model had slightly higher RMSE (98,318.51) but the lowest MAE (34,102.96). RNN_1_Layer and LSTM_2_Layer per-formed significantly worse, with higher RMSE and MAE values.
Software Files
Paper Write-Up
Contributors
akshatkarwa@gatech.edu
mehulrastogi@gatech.edu