COVID-19 and Politics
Scott Landholm, Shelby Parmer, Jocelyn Blander, Brandon Pitts, and Dannie Dilsaver
PSYC 500 Final Project
Scott Landholm, Shelby Parmer, Jocelyn Blander, Brandon Pitts, and Dannie Dilsaver
PSYC 500 Final Project
Introduction
2020 has been a critical year including a global pandemic and a crucial presidential election. But how do these two significant events relate? This project examines the relationship between the 2020 United States Presidential Election and COVID-19 prevalence and response.
First by analyzing the relationship between COVID-19 positivity rates by state and state political party affiliation in the 2020 Presidential Election
Then by exploring the connection between COVID-19 positive cases and number daily COVID-19 deaths.
WHY IT IS AN ISSUE OF INTEREST:
COVID-19 has been at the forefront of American's daily lives for nearly 9 months, but the information about COVID-19 has not always been straightforward. Depending on the source of information, reports about COVID-19's severity varied considerably. The total number of active cases, hospitalizations, and deaths was not consistent from source to source. Additionally, the information was changing (and still is changing) as more people get exposed, tested, and treated. The inconsistency in COVID-19 reporting occurs at a national-level, but is this inconsistency modulated by other factors?
PREVIOUS RESEARCH:
Previous studies, conducted by the New York Times and Brookings have produced conflicting findings, making it clear that more research on the topic is necessary to come to a definitive conclusion.
In order to justify a study of all 50 states, this project included a mini-analysis consisting of the the total number of positive COVID-19 cases of just 10 states, 5 who have historically been blue states and 5 who have historically been red states. Based on the findings shown in the graph below, it was determined that a larger analysis had a good chance of producing significant results.
Data Curation and Ethics
WHERE IS THE DATA FROM?
This project uses data from The COVID Tracking Project. The COVID Tracking Project is a publicly available website providing the most recent and update information on COVID-19 tests, cases, and hospitalization within the United States. Everyday information is updated providing current and accurate numbers on a wide range of COVID-19 related variables (anything from number of positive tests to overall number of COVID related deaths in a day). Our project specifically utilizes a dataset called "historical values by state." This dataset provides daily information by state on a range of COVID-19 variables. We choose this specific dataset because it provided daily information at the state level.
In addition to The COVID Tracking Project data we also incorporated data on election results for the 2020 US Presidential Election. More specifically, we utilized the Associated Press's data on which candidate/political party each state's electoral votes went towards.
DATA ETHICS:
The COVID Tracking Project compiles all public data on COVID-19. This data is then released into formats that are easy to read and interpret. The data that The COVID Tracking Project collects are all the data that are released from both the state and federal level. Each U.S. state has slightly different rules regarding data reporting and these guidelines are accounted for when The COVID Tracking Project compiles data. Additionally, each states individual rules regarding data reporting are publicized on The COVID Tracking Project website. Since this data is publicly accessible, there needs to be assurance that no private data is attached to any of the data files. The data The COVID Tracking Project provides is not linked to any private/personal information about specific patients/cases. The lack of personal/private information allows for the COVID-19 data to be accessed by public without concern for breaking any privacy laws.
Data Preparation and Exploration
Variables used in "Red" vs. "Blue" comparison:
State: Designated as either "Red" or "Blue" as determined by the 202 United States presidential election and combined into 2 seperate groups.
Mean Daily Covid 19 Positive Tests: The mean number of positive Covid 19 daily tests for each combined group-Either "Red States" or "Blue States".
Daily Covid 19 Tests Standard Deviation: The overall standard deviation from the mean for positive tests within each group state designation.
Mean Daily Total Covid 19 Tests: The mean number of total Covid 19 daily tests for each combined group-Ether "Red States" or "Blue States".
Daily Covid 19 Tests Standard Deviation: The overall standard deviation from the mean for total tests within each group state designation.
Mean Daily Covid 19 Ratio: The ratio of positive Covid 19 tests to total Covid 19 tests.
The following selections illustrate the data types for the data frames used in calculating the data that was used on our comparison of "Red" and "Blue" states.
The size of the "Red" data frame is 25 rows, 6 columns. The "Blue" data frame is 26 rows, 6 columns.
Bar Graph illustrating the difference in means between the "Red" states and "Blue" states.
The chart clearly indicates that there is a difference in the rate of infection between the "Red" and "Blue" states. It may be an indication that political affiliation has an impact on how the pandemic was managed.
Summary Data:
Model Building and Validation
Normality Results
The empirical CDF for both total deaths and total positive cases does not sit on the theoretical CDF, this indicates that the total deaths and positive cases are not normally distributed.
Below are the empirical CDF vs. theoretical CDF for Total Positive Test Results and Total Deaths due to COVID-19
Linear Regression
Procedure: We converted each of two columns from the pandas dataframe to numpy array and then generated a scatter plot. From there slope and intercept was determined using np.polyfit. The code can be seen below
Results: Below is the linear regression relating positive test results to number of deaths. For every additional positive test result there are 2.4% more COVID-19 daily deaths.
Bootstrap
Procedure: To determine if there is a significant difference in positive test cases between red and blue states a two-samples bootstrap test was conducted. Below is a portion of the bootstrapping code were a plot is generated showing both the shifted and original values.
Results:
The p-value tells you that there is about a 0.00% chance that you would get the difference of means observed in the experiment if mean positive COVID-19 cases in red and blue states were identical.
Because of this p-value, we rejected the null hypothesis in favor the alternative. (The null hypothesis is that there is no significant difference between the number of positive cases and red/blue state affiliation). Our results indicate that there may be a significant difference in the number of positive cases between red states and blue states.
Discussion
Summary
The objective we had for this project was to analyze how red and blue states were affected differently by COVID-19. We prepared data from The COVID Tracking Project by reducing it to the most necessary and important columns, including the state, positive tests, total tests, and deaths. We then sorted and combined the data by state, and using the previous election results, declared each state as either red or blue. Using this newly formed data, we ran multiple analyses including finding means and standard deviations, creating bar graphs, completing a linear regression analysis, as well as both, a bootstrap pairs hypothesis test and a two-sample bootstrapping test. With the results of this data, we were able to reject our null hypothesis in favor of the hypothesis that assumes there is a difference in positivity rates between red states and blue states.
Informed Insights
We were able to reject our null hypothesis after extensive analyses
Red states tend to have greater positivity rates than blue states
Limitations
A few limitations exist within the analysis of our data:
Using COVID-19 data, we understand that not all cases are reported and that false negatives/positives exist in testing.
When referring to “red and blue states,” we went off the previous election, however, these are not objective categories.
Some states didn’t report data to the source we had available
COVID deaths are very controversial, not knowing if the cause of death is directly related to COVID or underlying conditions
Future Research
With another COVID-19 spike of positive cases across the nation over the past few weeks, it may be of interest to see how individual state's positivity rate changed, comparing red to blue states. This could provide insight of what states are mandating more protocols at times of increasing rates.
It may also be of interest to see if swing states, whether they switched from red to blue, or blue to red this previous election, relate more closely with their previous or current political affiliation. their COVID-19 rates relate to.
Also, with a vaccine seemingly close, there may be an interest in studying vaccination rates between red and blue states.
Implications
The politics of the United States is a never-ending source of data science. It may be demonstrable that political affiliation does have an impact on how large-scale health emergencies are managed. The challenge is whether or not our political leadership will take heed of the discoveries derived from data science.