Data Critique

A Deep Dive into the Data

Our project utilized the dataset found at NIDA Drug Death Overdoses 1999-2022 Dataset. Data should never be used without critically analyzing the source, content, and implications of a given dataset. In this critique, we break down the characteristics of this particular dataset and analyze its benefits and shortcomings.

Click on each heading to learn more.

A graphic of a woman holding a clipboard and standing next to a person-sized tablet. The tablet reads "data" and is printing a long sheet with various analytics and visualizations printed on it.

image source

What does the data look like?

This dataset cross-examines a number of variables, such as the age, gender, and race of individuals who died from overdose deaths in the United States from 1999 to 2022. These variables are categorized as follows:

Age:
- total population
- 15-24 age range
Gender
- female
- male
Race:
- American Indian / Alaskan Native (Non-Hispanic)
- Asian (Non-Hispanic)
- Black (Non-Hispanic)
- Hispanic
- Native Hawaiian / Other Pacific Islander (Non-Hispanic)
- White (Non-Hispanic)

The dataset also breaks down the type of drug used:

Opioid class:

Prescription opioids
Heroin
Synthetic opioids other than Methadone (primarily fentanyl)

Non-opioids:

Benzodiazepines
Stimulants
Psychostimulants

Where does the data come from?

The National Center of Health Statistics (NCHS) at the Centers for Disease Control (CDC) and Prevention collected this information on deaths from fatal opiod overdoses. This data is available for public use at CDC Wonder, where both preliminary and final datasets can be viewed. Though we were not able to find definitive proof of this, it appears that the information is collected on a monthly basis through CDC's Vital Statistics Rapid Release program. As a result of the monthly collection schedule, death count data may slightly off, due to deaths being reported belatedly. We were not able to find any additional documentation about who was involved in the data collection process, what decisions were made about the dataset prior to publication, or any cleaning that occurred.

What can the data tell us?

U.S. overdoses have dramatically increased since 1999

This dataset covers data from 1999 to 2022, allowing for long-term trends and patterns in overdoses to be analyzed. Overall, rising overdose deaths can be seen as the rate of opioid overdose deaths per 100,000 people has dramatically increased from 1999 to 2022.

Some racial groups are disproportionately affected by fatal opioid overdose

By allowing us to visualize the changing rates of overdose deaths based on racial data, this dataset helps identify which demographic groups are at the highest risk for overdose. For example, we can see that by 2022, Indigenous* populations are by far the most disproportionately affected racial group.

*Indigenous populations are referred to as “American Indian / Alaskan Native” in much of the literature, as well as the dataset we used to create our visualizations. We have chosen to use the word “Indigenous” instead, as the use of the term “Indian” is controversial among certain indigenous communities.

The groups that are most affected have changed over time

Chronological mapping of this data shows us that the demographics of fatal overdoses has shifted over the years, with certain groups seeing more dramatic increases in death rates than others. For example, we see in the data that while White and Indigenous populations were most at risk for fatal overdose, by 2020, overdose death rates in White communities stabilized while overdose death rates in Black communities rose. Being able to visualize these shifts across communities helps us understand how crucial changes in American society have impacted different communities more than others.

Fatal overdoses have historically risen in connection to specific events

Significantly, our chronological data also shows precisely the years when opioid overdoses sharply increased, such as in 2013 when fentanyl flooded the U.S. drug market, and in 2020 when the COVID-19 pandemic caused overdose rates to surge dramatically.

Fatal overdoses are primarily caused by fentanyl

Visualizations of death rates based on drug type allow for deeper analysis. In our analysis, for example, we focused on fentanyl, as it was shown to contribute the most to the increase in deaths in recent years. By cross-examining fentanyl-related deaths and racial data, we were able to confirm the disproportionate impact of fentanyl has been driving the majority of drug overdoses. Thus, being able to compare the macro-level visualizations with micro-level visualizations allowed us to isolate specific variables that were correlated with changes on the larger scale.

Fentanyl is often consumed with other drugs

The data shows that fentanyl is often consumed in combination with other drugs, and that this can also lead to fatal overdose. The dataset allows us to visualize which drugs, when mixed with synthetic opioids, are the most fatal.

What can't the data tell us?

The dataset is missing the most recent data

The dataset lacks information for the most recent full year, 2023. Preliminary data from 2023 suggests that recent interventions involving increased access to naloxone have helped decrease death rates ("You can help reverse the overdose epidemic"). However, due to the lack of 2023 data in this dataset, we are unable to visualize these changes.

The data doesn't include information on many crucial identity markers

There are many identity markers besides race, gender, and age which may be significant risk factors for fatal overdose. For example, the dataset does not include the locations for where these deaths most commonly occur, making it difficult to analyze if location is a significant contributing factor to these opioid overdoses. Given that our readings showed that other non-racial identity factors are indeed significant indicators of fatal overdose risk, not having more data on other identities factors besides race, age, and gender, is definitely a shortcoming of the dataset (Alterkruse).

Moreover, the theoretical framework of intersectionality reminds us that that these various identity variables are not discreet labels, but rather the result of one’s place within a web of intricately intertwined axes of power. To give just one simple example, the history of redlining in the United States ties race to an individual's geographical location. Thus, none of these variables can truly be isolated from one another.

"Race" as an identity category is arbitrary and lacks specificity

It is important to acknowledge that “race,” as it is defined within the NIDA dataset, is an imperfect measure. Taking the social science understanding of the term “race” to refer not to an inherent identity, but rather an externally assigned identity based on one’s physical characteristics, we can understand this measure as one that is potentially very misaligned with individuals’ sense of self and the communities they belong to. Additionally, there is no indication for how individuals who belong to more than one racial group are counted within this system.

Moreover, we can understand these generalizations of race to be too broad—to take the category of “Asian” for example, it is not a given that people of East Asian, South Asian, and South East Asian communities experience the same challenges. And this does not even consider how these distinctions themselves further break down into national, local, and even generational distinctions. Lacking these types of data make it more difficult to accurately direct important resources to the communities that need them the most.

Other identity markers are similarly problematic

We can also critically examine the category of gender in this dataset. The use of just "male" and "female" perpetuates an essentialist view of sex / gender, and fails to consider other gender identities, such as non-binary and transgender people. There is no indication of how gender non-conforming individuals have been counted in the data—whether they were removed as outliers, or incorrectly subsumed into one of the binary options.

The dataset only collected age-specific information for the 15-24 age group. The data would be more complete with information on other age groups.

Things to keep in mind while reading the data

The data can potentially perpetuate existing racial prejudices

The dataset’s use of race as its primary area of analysis holds risk for stereotyping, which may cause the deterioration of public perception of certain racial groups. As analysts, we should keep in mind that correlation and causation are not the same, and that the identity factors correlated with drug use disorders are multitudinous and intricately connected.

Prejudice makes its way into legislation

The reinforcement of stereotypes can have a significant influence on policy-making, which risks further concretizing systematic discrimination against certain communities.

The biases behind the data are invisible

Due to a lack of clear documentation around how the dataset was collected and processed, we are unable to critically analyze the underlying conditions that created this dataset. Thus, it is important to keep in mind that there are invisible biases shaping this data.

A photo of red and white pills spilling out of a white pill bottle. There is a teal background.

image source

View Our Dataset

This dataset was adapted from the NIDA Drug death overdoses 1999-2022 Dataset in order to isolate specific variables, such as race, gender, and drug type.

Edited Data

15-24 Overdoses.xlsx

Page updated

Report abuse