We invite submissions conducting NLP+CSS research on the Opioid Industry Document Archive (OIDA), a collaborative undertaking of the University of California, San Francisco and Johns Hopkins University. Submissions will be evaluated for technical rigor and potential impact by CS and public health researchers. Prizes in the form of travel grants to ACL and the NLP+CSS workshop will be awarded to top submissions.
The U.S. opioid crisis is a multi-decade public health emergency: tens of thousands of Americans die from opioid overdoses each year, profoundly affecting families, communities, healthcare systems, and the economy. The crisis began in the late 1990s with widespread oversupply of powerful painkillers and has evolved since then in several waves, most recently with soaring rates of overdoses attributable to heroin and illicit fentanyl. Landmark litigation in federal and state courts has been important both in disclosing drivers of the epidemic as well as yielding tens of billions of dollars that communities are now using to reduce and prevent further harms.
The UCSF-JHU Opioid Industry Document Archive (OIDA) is a unique collection of materials that have been publicly disclosed as a result of this litigation. As of November 2025, OIDA contains over 6 million documents detailing the actions that led to the crisis, ranging from internal emails from pharmaceutical manufacturers, distributors and pharmacies, to depositions of industry executives. Some documents also relate to efforts to increase opioid sales in other countries, where the “industry playbook” developed in the U.S. has been applied in an effort to expand markets.
Despite the remarkable promise of OIDA, like many valuable databases and repositories, extracting useful insights from millions of unstructured text documents with limited metadata remains challenging.
Recommended additional background:
We invite task participants to conduct data analyses of OIDA and/or develop methodology and tools to support such work. While we invite broad submissions targeting any relevant social science, public health, policy, or law research questions with OIDA, we offer a few examples of potential directions of focus:
Classify documents discussing promotional strategies and identify key industry tactics to increase sales and evade government regulation
Identify internal corporate communications that treat serious issues, such as opioid addiction, with inappropriate tone
Detect misrepresentation of scientific data in corporate communications
Link high-pressure workplace culture to aggressive sales practices
Table B in the Supplementary Materials of Alexander et al. (2022) contains additional suggested research questions.
Submissions will take the form of a standard research paper, ACL format, short (4 pages) or long (8 pages) + unlimited appendix. As award-nominated submissions will be judged a panel of public health and NLP experts for their technical rigor and potential impact (see Evaluation Criteria below), they are expected to carefully document all work, including, for example, links to specific documents in OIDA studied, raw model outputs and data analyses, and easily runnable code.
Specific eligibility requirements:
Submissions must directly use OIDA
Submissions must focus on analyzing content in OIDA (e.g. using OIDA as training data for an unrelated task would not be eligible)
Submissions must involve text and language. Multi-modal data processing is welcome, but, for example, exclusively analyzing images in OIDA would not be eligible
As applicable, submissions must include open-source code
December 15, 2025: Shared task call release. Participants are encouraged to fill out the expression of interest form as soon as possible.
January and February 2026: Office hours (to be scheduled)
February 5, 2025: Interest form closes
March 5, 2026: Submission deadline
April 28, 2026: Notification of general acceptance to the workshop
May 5, 2026: Announcement of Winners
We recommend focusing on these OIDA collections, which are complete (not currently growing):
Teva and Allergan Documents (description)
Insys Litigation Documents (description)
West Virginia DEA Investigation Collection (description)
San Francisco Walgreens Litigation Documents (description)
Oklahoma Opioid Litigation Documents (description)
Mallinckrodt Litigation Documents (description)
McKinsey Documents (description)
National Prescription Opiate Litigation Documents (description)
Washington Post Opioid Collection (description)
Purdue Pharma HOC Investigation (description)
Purdue Pharma Bankruptcy Transcripts Collection (description)
Ohio Pharmacy Litigation Documents (description)
Kentucky Opioid Litigation Documents (description)
Florida Walgreens Litigation Documents (description)
KHN OxyContin Collection (description)
Metadata and OCR text of the documents can be downloaded as collection-level ZIP files from the Industry Documents Library. Original PDFs of the documents and, for certain documents, native file formats like PowerPoint and Excel can be retrieved using SciServer or AWS as described in the OIDA Toolbox. Alternatively, the OIDA team is willing to prepare subsets of these documents in Parquet format for teams preparing a submission for this shared task. If interested, please contact opioidarchive@jh.edu.
We will additionally hold office hours sessions in January and February to assist with questions about the data or feedback on project plans. To receive announcements about office hours, please fill out the expression of interest form.
If you are interested in participating, please fill out this brief expression of interest form. We will send updates about the task, including office hours times in January and February, to form respondents.
1-4 shared task submissions will be selected to receive travel grants of $1000-2000 to attend ACL and the NLP+CSS Workshop
Additional submissions will be selected for honorable mention awards, which will be highlighted at the workshop, but will not receive travel grants.
Submissions will first be reviewed through the standard workshop review process. All submissions that pass peer review will be invited to be presented at the workshop and optionally included in proceedings (authors may make non-archival submissions, but public versions of submissions must exist by the time of the workshop to be eligible for travel grants)
After the initial peer review, award-nominated shared task submissions will also be further evaluated by selection committees for two primary criteria:
Technical Rigor, evaluated by a committee of computer science researchers
Appropriate application and comparison of NLP methods.
Thorough evaluation and error analysis.
Potential Societal Impact, evaluated by a committee of public health researchers
Impact of computational artifacts (e.g., annotated data or tools built)
Impact of public health findings (i.e., the conclusive results)
Rankings from both committees will be combined, with equal weight given to both criteria, in order to select shared task winners and honorable mention awards.