As the world approaches the end of the pandemic, we must shift into the next phase. Communities around the country have come together to focus on recovery. Vaccination efforts have gone underway and the end of the pandemic is near. Economic damage has been severe but, efforts have begun to move past the recession caused by the virus.
In Maryland, we are currently in a recession. Early data has shown that minority groups have bared the brunt of the damage caused by the virus. Inequality in treatment and recovery has become clear. As the state continues to move out of the recession, what can be done to deal with the inequalities that have come out of the pandemic?
The purpose of this project is to forecast economic recovery in the state of Maryland over the next few years. After conducting a time series analysis of current economic and COVID data, I will develop three machine learning models (VAR, LSTM, MLP) to forecast economic recovery efforts and compare the accuracy of each model.
What are current economic conditions How has the economy in the state responded to vaccination efforts?
How have different demographics responded to recovery efforts? Can I Identify inequality in vaccinations or economic recovery? What can be done to help deal with these inequalities.
Can machine learning be used on a time series data set to develop an accurate model that will predict economic metrics such as GDP? If so, as vaccination efforts progress what can we expect to see in a few years.
By equally distributing vaccines, we can expect to reduce inequality across all demographics.
Files used in the project are located in Github Repository. Retrieved from the following sources. The primary features i will be focusing on are Cases, Vaccinations, Unemployment and Consumption.
COVID-19 Cases and Vaccinations
https://coronavirus.maryland.gov/ ( Daily COVID-19 Cases in MD by Race )
https://coronavirus.maryland.gov/#Vaccine ( Daily COVID-19 Vaccinations in MD by Race )
Unemployment
https://www.bls.gov/cps/ ( Bureau of Labor Statistics Current Population Survey )
https://www.bls.gov/lau/ ( Bureau of Labor Statistics Local Area Unemployment )
Consumer Expenditure
https://fred.stlouisfed.org/tags/series?t=consumption+expenditures%3Bmonthly ( St Louis Federal Reserve Personal Consumer Expenditures )
https://www.bls.gov/cex/ ( Bureau of Labor Statistics Consumer Expenditure Survey )
Gross Domestic Product (GDP) in Maryland - Gross domestic product is a measure of the overall growth of the Economy in Maryland. The first chart shows change in GDP over time. We can clearly see the large drop in early 2020 followed by the recovery. This shows the volatility in the economy over the past year and the strong impact COVID has had on the economy.
Federal Reserve System: Distribution of Wealth by Race - The second chart shows the total distribution of household wealth by race since 2015Q1. The chart clearly illustrates wealth inequality between different demographics prior to and during the COVID-19 pandemic. During the onset of the pandemic, all demographics experienced a decline in total wealth. However, throughout the year White Americans saw an increase in total wealth much faster than other races. Through this data we can see that pandemic had a negative economic impact. Additionally this supports the claim of widening inequality.
Economists have speculated what type of shape recovery may take on. The following are 4 possible outcomes that we can expect to see.
V-Shaped Recovery - A form of recovery marked by a "V" shape chart. This is characterized by a sharp decline followed by a quick, sharp recovery. A recession of the U.S. Economy between 1990 and 1991 shows what this may look like.
U-Shaped Recovery - A form of recovery marked by a "U" shape chart. This is characterized by a sharp decline followed by a stagnant then slow recovery period. The great recession between 2007 and 2009 in the U.S shows
L-Shaped Recovery - A form of recovery marked by a "L" shape chart. This is characterized by a sharp decline followed by a long, stagnant recovery period. This is considered the worst case scenario as it will take a long period of time to recover.
K-Shaped Recovery - A new type of recovery model that was developed by William & Mary Economist Peter Atwater. A form of recovery marked by a "K" shape chart. This is characterized by diverging arm and leg for different demographics specifically, the inequality between different races. This indicates that recovery has been quick for some and vastly different for others. To put it simply, "the haves are largely back to where they were before the outbreak while, the have-nots have even less"(Atwater,2020). Can we confirm that this is what is happening during the pandemic and how can we fix it?
V-Shaped Recovery
U-Shaped Recovery
L-Shaped Recovery
K-Shaped Recovery
Time Series Data
The data used for this projects consists of datasets from federal and state government. I combined four datasets into one for my time series and machine learning models. After the data cleaning phase, the final data set contained 50 features and 366 rows. Essentially the dataset is one years worth of COVID, Unemployment and Consumer Expenditure data from the state of Maryland during March 2020 through March 2021.
Distributions by Race
In these datasets, total values were reported separately from values by demographic. Therefore, after combining the datasets I calculated the distributions of each metric by race. These columns were added into a new dataframe and later used in the forecast models to analyze differences between each race.
interpolation
Due to different reporting standards, the original dataframes were different lengths. Maryland COVID cases and vaccination rates are reported daily however, unemployment and consumption data are only reported on a monthly basis. Due to this difference, I had to interpolate or estimate the economic data. I used linear interpolation to estimate daily economic data between the reported monthly statistics. This gave me a full years worth of estimated economic that I can use alongside the COVID data.
COVID Cases and Vaccinations
As of 03/01/2021, there were 382,702 total cases reported in Maryland. The distribution of the cases by race can be seen below. Over 1/3rd of reported cases where White closely followed by African Americans. The other demographics were far behind.
As for vaccinations, 1,332,588 people have received at least one dose of a COVID-19 vaccine. White Marylanders have received nearly 50% of all vaccinations while other races were well behind. There is a clear gap between vaccine distributions among different races. Could this inequality in vaccinations be causing K shaped recovery and would a more uniform or targeted vaccine distribution help combat this? The next step is to review the economic data over the past year.
Unemployment and Personal Consumer Expenditures
The previous section showed the clear inequality in vaccine distribution between each race. Now what has been the economic impact. To measure the impact, I am analyzing the estimated daily unemployment rate and estimated personal consumer expenditures over the past year.
The unemployment rate is the measure of any individuals age 16, that are actively looking for work but, unable to find a job. At it peak in April, the overall unemployment rate in MD was 10.1%. When it is broken down bay race, we can see that minority groups experienced greater rates of unemployment and continued to do so throughout 2021. On average, Black and Hispanic people experienced 3% greater unemployment compared to White and Asian individuals.
Personal Consumer Expenditures measures the amount households spend on goods or services. The chart below illustrates the changes in PCE over the past year. When this metric is broken down by race, it follows a similar pattern as unemployment. Minority groups experienced greater drops in PCE and slower recovery. This means that minorities in Maryland have experienced greater job and income loss throughout the pandemic. These results indicate that K shaped recovery is the most likely path at this current rate.
The next step, is to run linear regressions with the independent and dependent variables to better understand the relationships. Specifically, if the vaccine distribution rates effect unemployment or consumption of each demographic. The following regression plots illustrate the relationship.
For minority groups, there was a positive relationship between unemployment and distribution and a negative relationship between distribution and consumption. As cases increased, job losses were greater and less money was spent by minority groups. Interestingly for White and Asians in MD there was a negative relationship in both metrics. These regressions found that the metrics are correlated and further confirmed the differences in recovery. I can move forward with developing machine learning models to test my hypothesis.
After completing the Data Cleaning and EDA phases, I created three different ML models to test my hypothesis. I adjusted the vaccine distribution rates for each demographic to make them more evenly distributed. For the African American and Hispanic demographic, I increased the vaccinations by 20%. However, for White and Asian I decreased vaccinations by 20%. Subsequently I can test the impact this change will have on the unemployment and consumption metrics.
A VAR Model is an autoregressive model that uses a linear function with lagged values to learn the relationship between multiple variables.
When using a VAR model, the user must select the optimal number of Lags. For this dataset, 3 Lags was the optimal because it had the lowest AIC value.
Results
Small decline in unemployment and small increase in consumption for minority demographics.
Small increase in unemployment and decrease in consumption for White and Asian groups.
Based on the Mean Absolute Percentage Error the level of accuracy was 81% for this model.
Overall a good model but, the results were not as significant as I had hoped. Now how will this compare to the Neural Network Models.
An LSTM model is a type of Recurrent Neural Network. Where as standard RNN will only consider inputs in the short term, an LSTM model will preserve inputs over the long term. This has proven to be very effective for Time Series datasets.
This model consists of two layers. After adjusting the parameters the results indicated a high level of accuracy. The following plots are the results of this model.
Results
Small decline in unemployment and small increase in consumption for minority demographics. However, slightly greater changes compared to the previous model.
Small increase in unemployment and decrease in consumption for White and Asian groups. Slightly smaller gap when compared to the VAR Model.
Based on the Mean Absolute Percentage Error the level of accuracy was 94% for this model.
The results were consist with the prior model. Change in economic metrics but, not very significant. However, it had a much better accuracy rate
A MLP is a type of Artificial Neural Network uses a set of inputs to generate a set of outputs. It is defined by at least threelayers. An input layer, hidden layer and output layer. The parameters of the model and results can be seen below.
Results
Small decline in unemployment for minority demographics. However, this model was inefficient to run.
The forecasted results did not follow a linear pattern which was inconsistent with the unemployment dataset. Ultimately, the forecasted results did not really make sense.
Compared to the previous models, the accuracy remained high and the errors were low.
The LSTM preformed the best. The MAPE value indicated an accuracy rate of 94%.I observed that the unemployment rate dropped slightly for minority demographics while, consumption increased. On the other hand, unemployment slightly increased and consumption decreased for Asian and White demographics.
Based on these models, higher vaccination rates for would lead to more money and more jobs for minority groups. While, the change was not as significant as I had hoped there was still some improvement. The forecasted unemployment rate dropped by 2.2% from the original value. The consumption rate increased by more than 45%
On the other hand, unemployment increased by about 1.3% and consumption decreased by nearly 40% for White and Asian demographics. The below plots illustrate the LSTM predictions from March 1st to July 1st. I will be able to confirm how well this model did when actual data is released soon!
I was able to confirm my hypothesis and vaccination rates can help fight economic inequality that exists between different races in MD.
Hopefully this project will help others expand on these findings and conduct additional research. Possible future projects include the following:
Expand research to more demographics and metrics. Such as age, gender or income level.
Try additional machine learning methods.
Location of individuals or vaccination centers? See Travis Twigg's project "Determining Equitable COVID-19 Vaccination Site Locations Through Unsupervised Learning"
Allen, R. (2019, June). Keras LSTM Time Series. Stack Overflow. https://stackoverflow.com/questions/32514704/keras-lstm-time-series/35015167.
Atwater, P. (2020, June 10). The gap between the haves and the have-nots is widening sharply. Retrieved from https://www.ft.com/content/0ebfb7ca-a681-11ea-a27c-b8aa85e36b7e
Atwater, P. (2020, September). The K-Shaped Recovery A Narrative economics case study. Retrieved March 28, 2021, from https://www.linkedin.com/pulse/k-shaped-recovery-narrative-economics-case-study-peter-atwater?trk=read_related_article-card_title
Board of governors of the Federal reserve system. (2021). Retrieved February 18, 2021, from https://www.federalreserve.gov/releases/z1/dataviz/dfa/distribute/chart/#range:2010.1,2020.3;quarter:124;series:Net%20worth;demographic:race;population:all;units:levels
Brownlee, J. (2020, February 19). A gentle introduction to long short-term memory networks by the experts. Retrieved April 01, 2021, from https://machinelearningmastery.com/gentle-introduction-long-short-term-memory-networks-experts/
Brownlee, J. (2020, August 27). How to configure multilayer perceptron network for time series forecasting. Retrieved April 01, 2021, from https://machinelearningmastery.com/exploratory-configuration-multilayer-perceptron-network-time-series-forecasting/
Federal Reserve Economic Data: FRED: St. Louis Fed. (2021). Retrieved from https://fred.stlouisfed.org/
Gregory, V., Menzio, G., & Wiczer, D. (2020). PANDEMIC recession: NATIONAL Bureau of economic research. Retrieved February 18, 2021, from https://www.nber.org/system/files/working_papers/w27105/w27105.pdf
Interpolation (scipy.interpolate). (2021, February 18). Retrieved February 27, 2021, from https://docs.scipy.org/doc/scipy/reference/interpolate.html
Maryland department of health. (2021). Retrieved February 18, 2021, from https://coronavirus.maryland.gov/#Vaccine
Prabhakaran, S. (2020, September 17). Vector autoregression (var) - comprehensive guide with examples in python. Retrieved April 01, 2021, from https://www.machinelearningplus.com/time-series/vector-autoregression-examples-python/
Raschka, S. (2016). Python Machine Learning (3rd ed.). Packt Publishing.
Saraiva, C. (n.d.). How a ‘K-Shaped’ Recovery Is Widening U.S. Inequality. Retrieved February 18, 2021, from https://www.bloomberg.com/news/articles/2020-12-10/how-a-k-shaped-recovery-is-widening-u-s-inequality-quicktake#:~:text=The%20divergence%20during%20the%20health,privilege%20on%20the%20other.%E2%80%9D%20More
Unemployment rates for states. (2021, April 16). Retrieved February 01, 2021, from https://www.bls.gov/web/laus/laumstrk.htm
Zhang L, Ghader S, Pack M, Darzi A, Xiong C, Yang M, Sun Q, Kabiri A, Hu S. (2020). An interactive COVID-19 mobility impact and social distancing analysis platform. medRxiv 2020. DOI: https://doi.org/10.1101/2020.04.29.20085472