The Efficacy of COVID-19 Vaccinations

Mallory Holt and Leo Niehorster-Cook

PSYC 500 Final Project

cover image : https://commons.wikimedia.org/wiki/File:Logo_covid-19_immunity_RGS.png

I: Data Curation and Ethics

  • Introduction:

This time last year, COVID-19 had reached pandemic level worldwide, infecting and killing at record-breaking numbers, resulting in the race for a vaccine. Today, vaccines are rolling out by the millions all over the world, emphasizing the importance of investigating the efficacy and effectiveness of each vaccine and its ability to prevent infection. We want to analyze the relationship between COVID-19 case rate and vaccines locally and nationally in order to determine whether there may be a relationship. We also want to see if the number of people vaccinated against COVID-19 per day is increasing.


  • Data Curation:

The data used was curated from the Center for Disease Control and Prevention (CDC), specifically the COVID-19 Data Tracker, including COVID-19 case trends and vaccination trends over time pertaining to the United States. We also gathered data concerning specific county level COVID-19 cases and vaccinations for Kansas also provided by the CDC.


  • Data Ethics:

The CDC collects their data from state, territorial, tribal, and local public health agencies and federal entities that provide dose number, dose manufacturer, administration date, recipient ID, and date of submission. Therefore, the anonymity as far as patient information is still kept due to each patient being assigned to an ID number. None of their personal information is provided, only the non-identifying information of interest concerning vaccinations is gathered.

II: Data Preparation

  • Variable types

    • The variables of interest included the number of cases and the number of vaccinations per county in Kansas as well as the number of cases and the number of vaccinations per state in the United States.

    • The variables are displayed below with their corresponding types as well as a function defined explore required for this information.

      • All of the variables are listed for the data of interest, but not all of them are used for analyses.

  • Shape & Size of Data

    • The shape and size of the each of the data is displayed below with their data types from the function explore.

Function for Explore Defined:

  • Reshaped/Combined Data

    • The codes required to merge for further analyses including the linear regression analysis and the permutation hypothesis testing are displayed below.

III: Exploratory Data Analysis

  • Null Hypotheses

    • 1.) Within a territory, the percentage of the population vaccinated is not correlated with the case rate.

      • 1A.) The percentage of a state's vaccinated population is not correlated with that state's case rate.

      • 1B.) The percentage of a county's vaccinated population is not correlated with that county's case rate.

    • 2.) The rate of vaccinations per day in the United States is constant (i.e., neither increases nor decreases with time).

    • Research Hypotheses

    • 1.) Within a territory, the percentage of the population vaccinated is positively correlated with the case rate.

      • 1A.) The percentage of a state's vaccinated population is positively correlated with that state's case rate.

      • 1B.) The percentage of a country's vaccinated population is positively correlated with that county's case rate.

    • 2.) The rate of vaccinations per day in the United States is linearly increasing.

III: Exploratory Data Analysis (cont.)

  • Distributional Analysis (Hypothesis 1)

    • Percent of Population Fully Vaccinated - The graphs below display the distribution of people vaccinated by territory. Both distributions, displayed in blue, are normally distributed, as evident from their close fit to the matching artificially-generated normal distribution, displayed in orange.

    • Case rate - Case rate shows a floor effect. Across the country (i.e., hypothesis 1(a)), the floor effect is absent, but within Kansas (i.e., hypothesis 1(b)), the floor effect is strong, with over three-fourths of the counties showing that transmission has been suppressed.

  • Hypothesis 1a

  • Hypothesis 1b

  • Distributional Analysis (Hypothesis 2)

      • For the date variable, the data are a priori linearly distributed. Thus, this distribution is not displayed.

      • The distribution of number of people reaching full vaccinated per day strongly deviates from normality, also exhibiting a floor effect.

  • Descriptive Statistics (Hypothesis 1a, 1b)

  • Distributional Analysis (Hypothesis 2)

IV: Modeling Building/Validation

  • Functions Required for Analyses Defined:

  • Linear Regression Analysis (Hypothesis 1)

    • The graphs of the linear regression of Kansas and U.S. COVID-19 cases and vaccinations are displayed below along with their interpretations.

  • Interpretation:

    • As predicted by the research hypothesis, the number of COVID-19 vaccinations is positively correlated with the number of cases in the United States.

  • Interpretation:

    • As predicted by the research hypothesis, the linear regression analysis shows that the number of COVID-19 vaccinations is positively correlated with the number of cases in Kansas.

  • Linear Regression Analysis (Hypothesis 2)

    • The timeseries data is displayed below, together with a linear regression as well as its interpretation.

  • Interpretation:

    • As predicted by the research hypothesis, as time passes, more people become fully vaccinated.

    • However, there appears to be a strong deviation from this trend in the most recent weeks.

  • Permutation Hypothesis Testing (Hypothesis 2)

    • The functions for analyzing the Pearson correlation are displayed below along with the Pearson correlations for COVID-19 cases vs vaccinations as well as their interpretations.

U.S. Cases/Vaccinations Pearson Correlation:

Kansas Cases/Vaccinations Pearson Correlation:

Hypothesis 1a

Hypothesis 1b

  • Interpretation:

    • The permutation testing yielded p = 0.1758.

    • The null hypothesis cannot be rejected at the traditional 95% confidence threshold.

  • Interpretation:

    • The permutation testing yielded p = 0.1898.

    • The null hypothesis cannot be rejected at the traditional 95% confidence threshold.

  • Interpretation:

    • The permutation testing yielded p = 0.00

    • The null hypothesis can be rejected at the traditional 95% confidence threshold.

V: Discussion

  • Project Objective:

With COVID-19 vaccinations on the rollout, the purpose of this project was to investigate the efficacy of COVID-19 vaccinations in the United States and locally, in Kansas. We wanted to see if there was a negative correlation between COVID-19 cases and vaccinations in order to determine whether the reduction in COVID-19 cases was related to the increase in vaccinations. We gathered data from the CDC's COVID Data Tracker, which provided non-identifying information on COVID-19 cases and vaccinations statewide and nationwide. Some of the data required merging in order to effectively visualize specific distributions and analyses, which provided rationale to our research. We also defined several functions in order to perform specific analyses pertaining to the numbers of COVID-19 cases and vaccinations over time, including for checking the normality of the probability distribution, linear regression analysis, and permutation hypothesis testing in order to determine whether our results were significant.


  • Results of Analyses:

We were unable to find support for hypotheses 1a and 1b at the 95% confidence threshold. However, effect sizes in the predicted direction were observed in both cases. Possibly, with sufficient sample sizes, evidence for the research hypothesis could be found, that, within a territory, the percentage of the population vaccinated has a positive correlation with the case rate statewide and nationwide. Hypothesis 2 was confirmed with near-certainty (p = 0.000 in a permutation test with 100,000 samples). This means we fail to reject the null hypothesis that the rate of vaccinations per day in the United States remains constant over time.


  • Limitations:

These populations only include individuals over the age of 16, as the COVID-19 vaccine is not offered for anyone younger. The smaller the area, the more noise there is when it pertains to the United States. There are more likely to be correlations with larger sizes of territories with more individuals in the sample, therefore, there are more likely to be correlations statewide and nationwide, rather than countywide or citywide. Additionally, when analyzing county-level cases and vaccinations fort Kansas, it is important to know that the counties differ substantially as far as population, with several counties having a very low population distributed over a similar size of area to the few other counties with much larger populations, so the territorial and population differences make it difficult to generalize to other states like California, which has a very different population distribution. However, due to both sets of data being statistically significant, it is important to know, for future research in the United States, that efforts to prevent COVID-19 are not unfounded. It is also important to understand how the case distribution and vaccination distribution differs in other countries with different population distributions and healthcare systems, and that these results may not be able to be generalized to other countries with reduced access to healthcare and higher populations in different sizes of territories.


  • Implications:

We recommend further research in many other countries with larger populations, different sized territories, and with different healthcare systems due to the United States' unique incidence of COVID-19 and vaccine distribution access. It is important that COVID-19 vaccinations are analyzed with cases reported in other countries due to most of world's lack of supply in vaccinations and other factors that may be contributing to the case rate. It is also important to continue research on vaccine distribution in other countries in order to determine which areas need improvement and aid from other nations.

Data Source: https://covid.cdc.gov/covid-data-tracker/#datatracker-home