Download the opioid data file: opioids.csv
Access the data via Google Sheets:
With the opioid history in mind, we were looking for trends or patterns in opioid overdoses from 1999 to 2019 that reflect the historical facts on racial relations with opioids. We began by cleaning the dataset, specifically in every data record, we were looking for missing values, outliers, values that are not in the same format as the others in the same feature, or just values that are invalid. An example for the last one could be a negative value, as the numbers in our dataset is a rate, a number of people out of 100,000. If necessary, we would standardize data for consistency and remove any duplicates. An approach we could take with missing data is to delete the row or by replacing the missing value(s) with the average of that feature. We would record all the changes that are made. Fortunately, the dataset has no missing values or other inconsistencies. We only removed the columns that we did not need for our research question; we are only focused on opioid overdoses depending on race so we removed the columns for sex. The dataset did not need further cleaning.
After the data cleaning process, we moved on to use applications like Excel to plot the data. We created separate line graphs for each race to see how the amount of overdoses changed over time, and then we overlaid the graphs to compare against each other. This will give us an idea of the disparities, if there are any. We also used a pie chart to see which is the dominating race with the most overdoses. Visualizations give us a better understanding of the situation than raw data. If there are certain races that have a much higher rate of overdoses, it will show what kind of action that needs to be taken place. To incorporate accessibility, we used contrasting colors and put white space between the pie chart sections. Additionally, we labeled the overlaid line graph with the corresponding race to account for visual imparities. Another one of our goals is to see the geographical distribution of opioid overdoses throughout the U.S., this required us to find other resources to provide the desired information in order to make a map displaying the most affected areas.
The opioid dataset is only focused on the count and rate of overdoses of each combination of the features seen above. As limited information can be presented through a single dataset, there are external forces that may be affecting the numbers of each record. The data does not account for unreported/untreated overdoses, underlying health conditions, or other substances/medications/drugs intake of the individual, potentially inflating the "opioid induced" overdoses. While the scope of the data collection was in the U.S., the opioid overdoses were likely not uniformly distributed across the U.S., as certain cities or states must have contributed more to the opioid use and overdose rate than others. Although our study does not look at sex, the dataset reveals that males have a higher overdose rate than females. While the dataset only shows the different overdose rate between sexes, there might be a psychological reasoning behind this large difference of overdose rates between the sexes. The dataset does not explicitly present reasonings, but leaves room for further research on specifics.