Medical Decision Extraction, Analysis, and Classification Task (MedExACT)

BioNLP Shared Task @ ACL 2026

Introduction

The first MedExACT shared task focuses on detecting and labeling medical decisions in ICU discharge summaries. A medical decision is defined by the Decision Identification and Classification Taxonomy for Use in Medicine (DICTUM) (Ofstad et al., 2016), covering ten categories such as Defining Problem, Therapeutic Procedure, and Evaluation decisions. The MedDec dataset contains over 56k expert-annotated decision spans drawn from de-identified ICU discharge summaries in MIMIC-III, supplemented with patient demographics and ten phenotypes (Cancer, Heart Disease, Lung Disease, Chronic Neurologic Dystrophies, Chronic Pain, Alcohol Abuse, Substance Abuse, Obesity, Psychiatric Disorders, Depression). Systems will be evaluated for accuracy and robustness at both span and token levels, including stratified analyses by sex, race, and English proficiency.

Important Dates

To give teams more preparation time, we’ve updated the shared task timeline:

Jan 1 2026: Call for participation
Jan 1 2026: Train/Validation data release
Mar 20 2026 Mar 13 2026: Test set release
Mar 27 2026 Mar 20 2026: Submission of prediction results
Apr 3 2026: Results notification
April 17 2026: Deadline to submit system papers
July 3 OR 4, 2026: BioNLP Workshop, collocated with ACL 2026 in San Diego, CA

Task Definition

Given a full discharge summary, systems must detect contiguous text spans that express medical decisions and assign each span one of nine DICTUM decision categories (Contact related, Gathering information, Defining problem, Treatment goal, Drug, Therapeutic procedure, Evaluating test result, Deferment, Advice and precaution) or a None label when no decision is present.

Note: Some MedDec annotations may include Category 10 (Legal/insurance related) and Category 11 (Others). These categories are out of scope for MedExACT@ACL 2026 and are ignored by the official evaluator.

The following Table shows percentage of annotated spans for each decision category across protected variables in MedDec dataset. n is the number of the discharge summaries for each category and the last row shows the total count of decisions per variable.

Registration

Please join the google group to receive notifications and register your team https://groups.google.com/g/medexact-acl2026. If you have any question, feel free to send an email to medexact-acl2026+owner@googlegroups.com.

Prize

We will cover the full ACL 2026 registration fee for one team member from each of the top three teams.

Data Description

Here is a small example of data. The data is stored in JSON format, with each annotation capturing discharge summary ID: [SUBJECT_ID, HADM_ID, ROW_ID] to enable linking notes with MIMIC-III records, the decision text, the decision label, start/end character offsets, and a span-specific ID to ease referencing each specific annotation:

{

"annotator_id": "CALIML",

"discharge_summary_id": "10814 _101543_52781",

"annotations":

[

{

"decision": "cardiac arrest with heart block",

"category": "3: Defining problem",

"start_offset": "526",

"end_offset": "556",

"span_id": "AC_005"

{

"decision": "GU : Foley in place",

"category": "6: Therapeutic procedure",

"start_offset": "4000",

"end_offset": "4018",

"span_id": "AC_048"

{...}

]

}

To account for inconsistent predictions minor errors, we provide a script to clean the data. Please use this script to clean the data first after downloading. Please refer to the github repo for the script https://github.com/CLU-UML/MedDec?tab=readme-ov-file#clean-data

Data Access

Please note that accessing health data can be time consuming and we suggest applying as early as possible!

MedDec is built on top of MIMIC-III. Users must have access to MIMIC-III to use MedDec. Both MIMIC-III and MedDec are restricted-access resources.

To get access to MIMIC-III, you need to

Go to MIMIC-III dataset page on PhysioNet
Scroll down to the bottom
Sing in/up, and complete required training and data use agreement

The MedDec training and validation data is available on PhysioNet. To get access, you need to

Go to MedDec dataset page on PhysioNet
Scroll down to the bottom
Sing in/up, and complete required training and data use agreement

If you need support accessing the data faster, please contact the organizers via email medexact-acl2026+owner@googlegroups.com.

Evaluation

Submissions are evaluated on both performance and robustness across demographic and language subgroups:

Base F1, we first compute a base performance score combining span- and token-level F1:

Worst-Group F1
- We compute the same Base_Score separately for each of the following 9 subgroups:
  - Sex (Female, Male)
  - Race (White, African American, Hispanic, Asian, Others)
  - Patient Language Proficiency: (English, Non-English)
- Worst-Group F1: the minimum subgroup Base_Score across all the above subgroups

The final metric to rank systems is:

This evaluation rewards models that are both accurate and robust.
Evaluation script is available at: https://github.com/CLU-UML/MedDec/blob/main/evaluate.py

Prediction Results

Each team may submit up to 3 runs. The final score for the team will be the best of its submitted runs. The submission format is as follows:

Submit 3 up to separate JSON files named after your team ID: <team_id-x.json>. For example, abc-2.json refers to the second run of team abc.
The format should follow the data example above.
Submission link: Please submit your results here

System Paper

Teams will submit a system description paper describing their methods and findings.

Length: up to 4 pages (excluding references)
Format: ACL format
Review: peer-reviewed
Submission site: softconf
Archival status: archival — accepted papers will appear in BioNLP workshop proceedings and be published in the ACL Anthology.

Baseline Performance

We use RoBERTa as a baseline model. Model architecture, alternative solutions and their performance are available in Table 4 in (Elgaar, et al., 2024) The code for the baseline model is available at https://github.com/CLU-UML/MedDec

Citations

The following papers describe the MedDec dataset used in this challenge and a demo of our baseline system. Papers submitted to this challenge using the MedDec dataset should cite these papers as follows

Mohamed Elgaar, Jiali Cheng, Nidhi Vakil, Hadi Amiri, and Leo Anthony Celi. 2024. MedDec: A Dataset for Extracting Medical Decisions from Discharge Summaries. In Findings of the Association for Computational Linguistics: ACL 2024, pages 16442–16455, Bangkok, Thailand. Association for Computational Linguistics. Link, BibTex
Mohamed Elgaar, Hadi Amiri, Mitra Mohtarami, and Leo Anthony Celi. 2025. MedDecXtract: A Clinician-Support System for Extracting, Visualizing, and Annotating Medical Decisions in Clinical Narratives. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations), pages 481–489, Vienna, Austria. Association for Computational Linguistics. Link, BibTex

Organizers

Mohamed Elgaar (UMass Lowell)
Jiali Cheng (UMass Lowell)
Nidhi Vakil (UMass Lowell)
Mehrnaz Sadrolashrafi (BIDMC)
Mitra Mohtarami (Anselm College)
Adrian Wong (BIDMC)
Hadi Amiri (UMass Lowell)
Leo A. Celi (MIT)

Page updated

Google Sites

Report abuse