The first MedExACT shared task focuses on detecting and labeling medical decisions in ICU discharge summaries. A medical decision is defined by the Decision Identification and Classification Taxonomy for Use in Medicine (DICTUM) (Ofstad et al., 2016), covering ten categories such as Defining Problem, Therapeutic Procedure, and Evaluation decisions. The MedDec dataset contains over 56k expert-annotated decision spans drawn from de-identified ICU discharge summaries in MIMIC-III, supplemented with patient demographics and ten phenotypes (Cancer, Heart Disease, Lung Disease, Chronic Neurologic Dystrophies, Chronic Pain, Alcohol Abuse, Substance Abuse, Obesity, Psychiatric Disorders, Depression). Systems will be evaluated for accuracy and robustness at both span and token levels, including stratified analyses by sex, race, and English proficiency.
Jan 1 2026: Call for participation
Jan 1 2026: Train/Validation data release
Mar 23 2026: Test set release
Apr 5 2026: Deadline to submit prediction results
Apr 24 2026: Results notification
May 5 2026: Deadline to submit system papers
July 7 2026 (tentative): Workshop day
Given a full discharge summary, systems must detect contiguous text spans that express medical decisions and assign each span one of the ten DICTUM decision categories (Contact related, Gathering information, Defining problem, Treatment goal, Drug, Therapeutic procedure, Evaluating test result, Deferment, Advice and precaution, and Legal/insurance related) or a None label when no decision is present. The following Table shows percentage of annotated spans for each decision category across protected variables in MedDec dataset. n is the number of the discharge summaries for each category and the last row shows the total count of decisions per variable.
Please join the google group to receive notifications and register your team https://groups.google.com/g/medexact-acl2026. If you have any question, feel free to send an email to medexact-acl2026+owner@googlegroups.com.
We will cover the full ACL 2026 registration fee for one team member from each of the top three teams.
Here is a small example of data. The data is stored in JSON format, with each annotation capturing discharge summary ID: [SUBJECT_ID, HADM_ID, ROW_ID] to enable linking notes with MIMIC-III records, the decision text, the decision label, start/end character offsets, and a span-specific ID to ease referencing each specific annotation:
{
"annotator_id": "CALIML",
"discharge_summary_id": "10814 _101543_52781",
"annotations":
[
{
"decision": "cardiac arrest with heart block",
"category": "3: Defining problem",
"start_offset": "526",
"end_offset": "556",
"span_id": "AC_005"
},
{
"decision": "GU : Foley in place",
"category": "6: Therapeutic procedure",
"start_offset": "4000",
"end_offset": "4018",
"span_id": "AC_048"
},
{...}
]
}
To account for inconsistent predictions minor errors, we provide a script to clean the data. Please use this script to clean the data first after downloading. Please refer to the github repo for the script https://github.com/CLU-UML/MedDec?tab=readme-ov-file#clean-data
Please note that accessing health data can be time consuming and we suggest applying as early as possible!
MedDec is built on top of MIMIC-III. Users must have access to MIMIC-III to use MedDec. Both MIMIC-III and MedDec are restricted-access resources.
To get access to MIMIC-III, you need to
Scroll down to the bottom
Sing in/up, and complete required training and data use agreement
The MedDec training and validation data is available on PhysioNet. To get access, you need to
Scroll down to the bottom
Sing in/up, and complete required training and data use agreement
If you need support accessing the data faster, please contact the organizers via email medexact-acl2026+owner@googlegroups.com.
Submissions are evaluated on both performance and robustness across demographic and language subgroups:
Base F1, we first compute a base performance score combining span- and token-level F1:
Worst-Group F1
We compute the same Base_Score separately for each of the following 9 subgroups:
Sex (Female, Male)
Race (White, African American, Hispanic, Asian, Others)
Patient Language Proficiency: (English, Non-English)
Worst-Group F1: the minimum subgroup Base_Score across all the above subgroups
The final metric to rank systems is:
This evaluation rewards models that are both accurate and robust.
Evaluation script is available at: https://github.com/CLU-UML/MedDec/blob/main/evaluate.py
Each team may submit up to 3 runs. The final score for the team will be the best of its submitted runs. The submission format is as follows:
Submit 3 up to separate JSON files named after your team ID: <team_id-x.json>. For example, abc-2.json refers to the second run of team abc.
The format should follow the data example above.
Teams will submit a system description paper describing their methods and findings.
Length: up to 4 pages (excluding references)
Format: ACL format
Review: peer-reviewed
Archival status: archival — accepted papers will appear in BioNLP workshop proceedings and be published in the ACL Anthology.
Submissions will be through softconf and link will be added soon.
We use RoBERTa as a baseline model. Model architecture, alternative solutions and their performance are available in Table 4 in (Elgaar, et al., 2024) The code for the baseline model is available at https://github.com/CLU-UML/MedDec
The following papers describe the MedDec dataset used in this challenge and a demo of our baseline system. Papers submitted to this challenge using the MedDec dataset should cite these papers as follows
Mohamed Elgaar, Jiali Cheng, Nidhi Vakil, Hadi Amiri, and Leo Anthony Celi. 2024. MedDec: A Dataset for Extracting Medical Decisions from Discharge Summaries. In Findings of the Association for Computational Linguistics: ACL 2024, pages 16442–16455, Bangkok, Thailand. Association for Computational Linguistics. Link, BibTex
Mohamed Elgaar, Hadi Amiri, Mitra Mohtarami, and Leo Anthony Celi. 2025. MedDecXtract: A Clinician-Support System for Extracting, Visualizing, and Annotating Medical Decisions in Clinical Narratives. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations), pages 481–489, Vienna, Austria. Association for Computational Linguistics. Link, BibTex
Mohamed Elgaar (UMass Lowell)
Jiali Cheng (UMass Lowell)
Nidhi Vakil (UMass Lowell)
Mitra Mohtarami (Anselm College)
Adrian Wong (BIDMC)
Hadi Amiri (UMass Lowell)
Leo A. Celi (MIT)