"The absence of unfair and avoidable or remediable differences in health among population groups defined socially, economically, demographically or geographically" (World Health Organization, 2011).
In the United States and worldwide, the COVID-19 pandemic has "magnified the intersectional and pervasive impacts of social and structural inequities in society. COVID-19 is having a disproportionate impact on people who are already disadvantaged by virtue of their race and ethnicity (see Figure 1), age, health status, residence, occupation, socioeconomic conditions, and/or other contributing factors" (Williams and Cooper, 2020).
"Given the legacies of inequality, injustice, and discrimination that have undermined the health and well-being of certain populations in the United States for centuries, considerations of equity should factor into plans for allocating and distributing COVID-19 treatments and vaccines to the population at large" (Essien et al., 2020).
So far, many state and local authorities have determined vaccine distribution sites based on population density or health care infrastructure availability (see Figure 2), without taking into account the disproportionate experience of the COVID-19 pandemic or the lack of access to health care facilities for many of these communities.
Figure 1: COVID-19 is hitting certain communities
harder than others. (Li, W., 2020)
Figure 2: In Fulton and DeKalb, GA (left), one fifth of vaccination sites are located in Black neighborhoods, and many of these are grocery store pharmacies with short supply. In Travis and Bastrop, TX (right), vaccination sites are far from many Hispanic and immigrant communities. (McMinn, S., 2021)
This project aims to use an unsupervised weighted centroid-based clustering algorithm and neighborhood level demographic data to determine equitable vaccination distribution sites in six counties across five states that were previously identified by NPR as containing the largest vaccination disparity between majority-White communities and majority-BIPOC communities. The final deliverable will be an interactive web app that will allow users to input parameters and compare the vaccination distribution maps.
In this project, I am using a combination of tabular demographic survey data gathered by the U.S. Census Bureau, GIS spatial boundary data, and current vaccination location data.
The demographic survey data and relevant spatial boundary data is sourced from the National Historical Geographic Information System (NHGIS) database, which provides easy access to U.S. census data for all geography levels. Current vaccination location data is sourced from various state databases connected to state health department sites.
The demographic survey data used in this project was collected as apart of the 2019 American Community Survey (ACS), which aims to collect demographic data throughout the United States on a yearly basis. The data is stored in CSV files and contains block-group level population data for different racial and ethnic groups.
Figure 3: Hierarchy of census geographic entities
(Caliper, 2020)
The U.S. Census aggregates data at many different geographic levels. This project uses data at the block group level, which is the smallest geographic level which ACS data is available. For the states evaluated in this project, the median number of households per block group was 936.
Geographic boundary data is stored in shapefile format. A shapefile is a common standard for representing geospatial vector data that has features (unique polygons) and attribute information (population, income, etc.). To the right you can see a standard shapefile of Baltimore City, as well as the same shapefile after joining with the demographic survey data.
Once the demographic survey data has been joined to the geographic boundary shapefiles, the centroid of every block-group is determined. Each centroid contains the same attribute data as their neighborhood as well as a unique lat/long coordinate. It is these neighborhood centroids that this project aims to cluster to determine better vaccination sites.
Current vaccination locations are sourced from state health department sites. Each state has their own unique way of making this information available to the public. Maryland, for example, utilizes an ArcGIS map to display current sites while allowing users to download data in various formats. All other states in this study either visualize the vaccination sites on a map without allowing users to download the data, or simply list the addresses of each vaccination site per county in a table, making geocoding of vaccination sites difficult!
Weighted K-Means (WK-Means) is an unsupervised weighted centroid-based clustering algorithm. In WK-Means, the algorithm uses a weighted average instead of a standard mean when determining the location of cluster centroids. This "pulls" the centroid closer to points with higher weights (see figure below). This allows the user to weight points based on any attribute of the point. For my project, users will be able to weight points by total population, income, or BIPOC population of the associated neighborhood. This will allow the user to allocated resources in multiple ways to ensure equitable distribution.
Pick k random points as centroids and initialize the weights wk = 1/k, where k is the number of vaccination sites.
Assign all other points to the centroid that minimizes the weighted distance. The distance between point i and centroid of cluster k is calculated by the formula below, where the Haversine or the great-circle distance is used as the distance function.
3. Recalculate each centroid as the mean latitude and longitude within each cluster.
4. Update the weights according to the formula below, where alpha is chosen by the user and beta = 1 - alpha.
5. Repeat 2 and 3 until the coordinates of the centroids don't change.
The points chosen by WK-Means will likely not line up with the appropriate infrastructure needed to distribute vaccinations. Therefore, a nearby search is needed to find spaces that can meet these requirements. I have chosen ten categories of locations to search for: pharmacies, grocery stores, community centers, churches, schools, colleges & universities, clinics, medical centers, and hospitals.
This is accomplished using forward geocoding provided through the MapBox API to search for these locations. The figure to the right displays the WK-Means chosen points (black) & closest infrastructure (red) for each neighborhood in Baltimore City. The median distance between the old and new points is 274 meters.
Since WK-Means is an unsupervised model, there is no target variable and no typical metric to use to evaluation performance. Given this, I have created two custom metrics for performance evaluation:
Average distance from neighborhood to nearest vaccination site.
Percent of vaccination sites located inside communities of interest.
A web app was created using Streamlit that allows users to generate vaccination sites based on chosen parameters. The user selects the county and attribute of interest, as well as a weighting parameter. Then an interactive distribution map and histogram is generated detailing the geographic distribution of the chosen attribute. An additional interactive distribution map is generated that displays the location of each vaccination site. When the cursor hovers over a site, a text box pops up displaying the name, address, and type of facility. A downloadable table is displayed at the bottom of the app detailing each vaccination site, as well as table showing the mean distance from each neighborhood to its nearest vaccination site.