MEDIQA 2021
Summarization in the Medical Domain
MEDIQA is a series of shared tasks on Medical NLP. Previous edition: MEDIQA 2019
Introduction
MEDIQA 2021 tackles three summarization tasks in the medical domain: consumer health question summarization, multi-answer summarization, and radiology report summarization. In this shared task, we will also explore the use of different evaluation metrics for summarization.
MEDIQA 2021 will be organized at the NAACL-BioNLP 2021 workshop.
Join our mailing list: https://groups.google.com/g/mediqa-nlp
News
May 2021: Overview paper of the MEDIQA 2021 Shared Task https://www.aclweb.org/anthology/2021.bionlp-1.8.pdf
April 2021: A post-challenge round was released to allow submitting new runs on AICrowd https://www.aicrowd.com/challenges/mediqa-2021
Tasks
1) Summarization of Consumer Health Questions
Consumer health questions tend to contain numerous peripheral information that hinders automatic Question Answering (QA). Empirical QA studies based on manual expert summarization of these questions showed a substantial improvement of 58% in performance [1]. Effective automatic summarization methods for consumer health questions could therefore play a key role in enhancing medical question answering.
The goal of this task is to promote the development of new summarization approaches that address specifically the challenges of long and potentially complex consumer health questions.
Relevant approaches should be able to generate a condensed question expressing the minimum information required to find correct answers to the original question [2].
2) Summarization of Multiple Answers
Different answers can bring complementary perspectives that are likely to benefit the users of QA systems. The goal of this task is to promote the development of multi-answer summarization approaches that could solve simultaneously the aggregation and summarization problems posed by multiple relevant answers to a medical question [4].
3) Summarization of Radiology Reports
The automatic summarization of radiology reports has several clinical applications such as accelerating the radiology workflow and improving the efficiency of clinical communications.
This task aims to promote the development of clinical summarization models that are able to generate radiology impression statements by summarizing textual findings written by radiologists [7-8].
Datasets
Task 1: Question Summarization
Training Data: The MeQSum Dataset of consumer health questions and their summaries [2] could be used for training. Participants can use available external resources, including, but not limited to medical QA datasets and question focus & type recognition datasets. For instance, the CHQs Dataset [3] contains additional annotations (e.g. medical entities, question focus, question type, keywords) of the MeQSum questions.
Validation and Test Sets: Consist of consumer health questions received by the U.S. National Library of Medicine (NLM) in December 2020 and their associated summaries, manually created by medical experts.
The validation set is available on the MEDIQA Github project.
The test set will be available for the registered participants with the official MEDIQA 2021 test sets on AIcrowd, under the Resources section.
Task 2: Multi-Answer Summarization
Training Data: The MEDIQA-AnS Dataset [4] could be used for training. Participants can use available external resources (e.g. existing medical QA datasets).
Validation and Test Sets: The original answers are generated by the medical QA system CHiQA [5] which searches for answers from only trustworthy medical information sources [6]. The summaries are manually created by medical experts.
The validation set is available on the MEDIQA Github project.
The test set will be available for the registered participants with the official MEDIQA 2021 test sets on AIcrowd, under the Resources section.
Task 3: Radiology Report Summarization
Training Data: A subset from the MIMIC-CXR Dataset [13,14] could be used for training. Instructions and scripts to download this training set are described here: https://github.com/abachaa/MEDIQA2021/tree/main/Task3.
Participants can use available external resources. But, please note that the rest of the MIMIC-CXR reports as well as the Indiana dataset should not be used for training.
Validation set: A subset from the MIMIC-CXR and Indiana datasets, available on the MEDIQA Github project.
Test Set: The test set will be available for the registered participants with the official MEDIQA 2021 test sets on AIcrowd, under the Resources section.
Registration
The registration & data usage agreement form is available under the Resources section of the AIcrowd projects.
The form covers the three tasks. You can download it from any of the three MEDIQA projects: QS@AIcrowd, MAS@AIcrowd & RRS@AIcrowd.
To register, you need to complete, sign, and upload the form. When approved, you will be able to download the official test sets and to submit your runs on the AIcrowd submission systems.
Submission & Evaluation
The AIcrowd platform will be used for releasing the test sets and submitting runs: https://www.aicrowd.com/challenges/mediqa-2021
Consumer Health Question Summarization: QS@AIcrowd
Multi-Answer Summarization: MAS@AIcrowd
Radiology Report Summarization: RRS@AIcrowd
Submission Format for the three tasks:
ID [tab] Summary
The summary must fit in one line (no line breaks)
Each team is allowed to submit 10 runs for each task.
Evaluation Metrics:
ROUGE [9] will be used as the main metric to rank the participating teams [10], but we will also use several evaluation metrics more adapted to each task such as HOLMS [11] and CheXbert [12].
Official Results
Organizers
Asma Ben Abacha, NLM/NIH
Chaitanya Shivade, Amazon
Yassine Mrabet, NLM/NIH
Yuhao Zhang, Stanford University, Amazon AWS AI
Curtis Langlotz, Stanford University
Dina Demner-Fushman, NLM/NIH
Important Dates
December 16, 2020: First call for participation, with information about the training data.
December 22, 2020: AIcrowd projects go public. Release of the training set for Task 3.
January 29, 2021: Release of the validation sets.
February 26, 2021: Release of the test sets. Run submission opens on AIcrowd.
March 5, 2021: Run submission deadline. Participants' ROUGE scores will be available on AIcrowd.
March 10, 2021: Release of the official results.
March 19, 2021: Papers due date (Submission website and instructions).
April 15, 2021: Notification of acceptance.
April 26, 2021: Camera-ready papers due (hard deadline).
June 11, 2021: BioNLP Workshop @NAACL'21
References
[1] "On the Role of Question Summarization and Information Source Restriction in Consumer Health Question Answering". Asma Ben Abacha & Dina Demner-Fushman. AMIA 2019 Informatics Summit.
[2] "On the Summarization of Consumer Health Questions". Asma Ben Abacha & Dina Demner-Fushman. ACL 2019. MeQSum Dataset.
[3] "Semantic Annotation of Consumer Health Questions". Halil Kilicoglu, Asma Ben Abacha, Yassine Mrabet, Sonya E Shooshan, Laritza Rodriguez, Kate Masterton & Dina Demner-Fushman. BMC Bioinformatics, 2018. CHQs Dataset
[4] "Question-Driven Summarization of Answers to Consumer Health Questions". Max E. Savery, Asma Ben Abacha, Soumya Gayen & Dina Demner-Fushman. Scientific Data, Nature, 2020. MEDIQA-AnS Dataset.
[5] "Consumer health information and question answering: helping consumers find answers to their health-related information needs". Dina Demner-Fushman, Yassine Mrabet & Asma Ben Abacha. JAMIA 2020.
[6] "A Question-Entailment Approach to Question Answering". Asma Ben Abacha & Dina Demner-Fushman. BMC Bioinformatics, 2019. MedQuAD Dataset.
[7] "Learning to Summarize Radiology Findings". Yuhao Zhang, Daisy Yi Ding, Tianpei Qian, Christopher D. Manning & Curtis P. Langlotz. EMNLP-LOUHI 2020.
[8] "Optimizing the Factual Correctness of a Summary: A Study of Summarizing Radiology Reports". Yuhao Zhang, Derek Merck, Emily Bao Tsai, Christopher D. Manning & Curtis P. Langlotz. ACL 2020.
[9] "ROUGE: A Package for Automatic Evaluation of Summaries". Chin-Yew Lin. ACL 2004.
[10] "Re-evaluating Evaluation in Text Summarization". Manik Bhandari, Pranav Gour, Atabak Ashfaq, Pengfei Liu & Graham Neubig. EMNLP 2020.
[11] " HOLMS: Alternative Summary Evaluation with Large Language Models". Yassine Mrabet & Dina Demner-Fushman. COLING 2020.
[12] "CheXbert: Combining Automatic Labelers and Expert Annotations for Accurate Radiology Report Labeling Using BERT". Akshay Smit, Saahil Jain, Pranav Rajpurkar, Anuj Pareek, Andrew Y. Ng & Matthew P. Lungren. EMNLP 2020.
[13] "MIMIC-CXR Database (version 2.0.0)". Johnson, A., Pollard, T., Mark, R., Berkowitz, S., & Horng, S. PhysioNet. 2019.
[14] "MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports". Johnson, A.E.W., Pollard, T.J., Berkowitz, S.J. et al. Sci Data 6, 317. 2019.