Coefficient: 0.00997
Intercept: 151679.8483
Max Error: 182489.5960
μ Abs. Error: 84159.5211
seaborn.regplot()
scikit-learn
The cumulative number of cases on a given date was graphed with its corresponding cumulative number of deaths on that date
The black regression line shows a linear regession line using scikit-learn
The red regression line shows a linear regression line with a 90% confidence interval created by seaborn
The pink regression line shows a polynomical like regression created with numbers that are not part of the original dataset
The green regression line shows a polynomical like regression created with numbers used to train the model (cannot be seen as it is merely the exact same graph that the pink regression line creates)
Given the data calculated you can hypothesis that approximately 0.99% of all cases result in death with an increment of aproximately 151,679 deaths
Further calculations proving the significance model is seen through a basic maximum error and mean absolute error analysis, 182,489 and 84,159.5 respectively
Given such low values as compated to actual dataset, the polynomical regression may be deemed a fit model for accurate correlation
When removing the intercept from the equation the regression line shows a more accurate representation off the cases versus deaths towards more current time measurements
Makes sense due to the nature of the curve
May be a better off to plot a logistic or polynomical regression lines? (Later added and shown in current graphs)
The linear regression line plotted cannot be used to accurately predict the amount of deths per case as all patients who are COVID-19 positive have different reactions to the virus. Additionally, those who either have pre-exisiting health conditions (more-so cardiovascular and respiratory) or have a previously comprimised immune system have a higher chance of severe complications: another variable that must be taken into consideration when predicting the mortality rate of the virus
Neither of the prediction methods take into account vaccinations and their effect on the number of total postive cases
Shown above is a model representing the prediction of the number of deaths based from number of cases
Graph created by randomizing N number of total cases and predicting death number based from current data
N generalized to number of days, year to date ==> prediction data includes same number of points as recorded data
There is an apparent steep increase of deaths after a cummulative number of about 14 million cases increasing the number of deaths by 2x when comparing deaths at 10 million cumulative and 20 million cumulative
In the original prediction graph this was proven to me what reality would come to. The graph was able to predict the realtive number of deaths given the steep increase in cases
After the vaccine rollout and the great decrease of case the prediction graph has created a different shape as depicted in the one above
Prediction is not accurate to actual future of novel coronavirus as its effect varies from each individual and many variables are unaccounted for such as immune response and healthcare outreach
Cumulative Cases, Deaths, and Tests through the time period of January 1, 2020 to June, 2023
Cumulative Fully Vaccinated, Single Vaccinated, and Booster through the time period of January 1, 2020 to June, 2023