Conclusion & Results

Note: See "Models Implemented" for results and technical analysis of findings.


This study spans many areas of the opioid crisis, including opioid prescription behavior, opioid overdose patterns by drug type and state, the correlation between healthcare cost and opioid overdose, and how socioeconomic status relates to the opioid crisis. This breadth of study allows for the investigation of many significant research questions that can improve the lives of those affected by opioid use and overdose.

Analysis of the "Medicare Part D- By Prescribers" dataset from the Centers for Medicare and Medicaid includes nationwide prescriber behaviors and patterns, including data on their credentials, location, geography code (rural vs urban), and prescription information. Prescribing behaviors were found to vary by provider credentials, indicating that education and training significantly influence opioid prescribing habits. For example, dentists were distinguishable from physician assistants based on their opioid cost and rates. These patterns are important to uncover because a medical professional's treatment philosophies are heavily influenced by their education. If certain provider types are over-prescribing opioids, this has a direct impact on patients, increasing their susceptibility to addiction and overdose. Furthermore, analysis revealed a distinct shift in the prescription rates of long-acting opioids. While overall rates of prescription opioids are decreasing over time in the US, the percentage of long-acting opioids out of all opioid prescriptions began drastically increasing in 2017. The change in long-acting opioid prescription rates is important to note. It is important to address the risks of long vs short-acting opioids and further understand how trends in prescription type may affect patients.

By applying K-means clustering to analyze drug overdose rates by location and drug type in 2019, four geographic differences in substance abuse trends were revealed. For example, states in the Northeast and parts of the Midwest have the highest overdose rates, particularly involving fentanyl and heroin. By applying apriori frequent pattern mining to the dataset, it was determined that there is a strong co-occurrence of specific drugs, fentanyl, heroin, and methamphetamine. This shows how drug use patterns differ regionally. Public health officials can use this information to design better data-driven prevention and intervention strategies that address specific drug abuse patterns in different regions. This information can also help policymakers develop data-driven legislation to address regional drug trends before they escalate into nationwide public health crises. 

Multiple Linear Regression and K-means clustering on the “Cost Component of Opioid Use Disorder and Fatal Opioid Overdose, by Jurisdiction” dataset allowed for many patterns to be found. It revealed the importance of the cost of criminal justice and productivity budgets, which significantly affect the rates of opioid use in each state. The pattern emerged with the coefficient strength, which revealed the variance among these variables. This same idea revealed other truths about columns, highlighting clever ways to allocate taxpayer money to benefit the state as a whole. On top of this, clustering was performed by summing the cost of opioid use disorder and fatal opioid overdose, respectively. The clusters showed a strong correlation with each other and revealed certain states that share similar spending patterns. These patterns can help show states with a higher case count per capita what lower-case-count-per-capita states allocate their money to, in order to help decrease deaths, disorders, and other related issues. Patterns like these can greatly help states with their respective issue and help resolve the crisis nationwide more efficiently. 

From using support vector machines and hierarchical clustering, it appears there is a relationship between the number of opioid overdoses and the cost per capita, as well as the cost of fatal overdoses, for this data set’s demographic information. These patterns show a need to allocate resources to the education and prevention of opioid addiction, to hopefully help mitigate the crisis of opioid addiction costs that is placed on states, and ensure a healthier future going forward. Education is the foundation of prevention, and the more resources put towards that effort should help mitigate some of the high overdose rates being seen today.

Future studies can focus on expanding the models with more up-to-date data and incorporate factors like socioeconomic status and access to healthcare to explain regional differences more deeply.  Additional studies could focus on the cost put into educational resources versus the opioid overdose rates over the years to see if the resources have made a direct impact in reducing the staggering, increasing yearly rate of opioid overdoses. 

Limitations & Ethical Concerns

Some limitations faced by this study included the lack of health data due to the Health Insurance Portability and Accountability Act (HIPAA). HIPAA is a federal law passed in 1996 to protect the privacy and security of patient health information. Due to these privacy laws, demographic information and additional items considered personally identifying are always scrubbed from the dataset to ensure anonymity. All data must be treated as sensitive to protect the rights of participants and ensure ethical compliance when working with healthcare data.

Another limitation of this project is due to the nature of the data. Most of the data provided is summative, and some lacked reporting in certain demographic areas. This could lead to skewed model learning for under-reported data in rural areas, plus the aggregate demographic costs could create an ecological fallacy due to the uncertainty of the population compared to the sample provided.

When interpreting the results, there is high risk of stigmatizing states and providers with specific credentials if context is not taken into consideration. It is imperative to combine our model insights with public health context before making policy recommendations that could reinforce harmful stereotypes or make poor intervention decisions.  Additionally, ensemble models, like Random Forest, can be difficult to understand and communicate, further limiting interpretability.