Title: Applying Principal Components & Discrimination Analysis to Covid-19 Data
Abstract:
We implement Principal Components and Discrimination Analyses to explore the impact of Region and County Covid-19 metrics and data to (1) assess which metrics better separate the 62 NYS counties and 8 regions, and to (2) obtain a Discriminant function that is able to classify all NYS counties and regions into high and low risk groups, according to their Covid-19 metrics. We use several existing classification variables, including (1) percent positives per 10K (of county population) and (2) percent deaths per 10K. As we do not have population density per county, we use (3) subjective predominantly Urban/Rural status for each county. By implementing Principal Components/Factorial analyses, we establish (1) which variables most significantly differentiate among high and low infection counties, and then obtain PC scores as yet another variable to differentiate between regions and counties. Then, using such variables, we (2) develop several Discrimination Functions and classify counties into two (high and low risk) groups, according to their Covid-19 metrics.