Summarization in the Medical Domain
MEDIQA 2021 tackles three summarization tasks in the medical domain: consumer health question summarization, multi-answer summarization, and radiology report summarization. In this shared task, we will also explore the use of different evaluation metrics for summarization.
MEDIQA 2021 will be organized at the NAACL-BioNLP 2021 workshop.
Join our mailing list: https://groups.google.com/g/mediqa-nlp
1) Summarization of Consumer Health Questions
Consumer health questions tend to contain numerous peripheral information that hinders automatic Question Answering (QA). Empirical QA studies based on manual expert summarization of these questions showed a substantial improvement of 58% in performance . Effective automatic summarization methods for consumer health questions could therefore play a key role in enhancing medical question answering.
The goal of this task is to promote the development of new summarization approaches that address specifically the challenges of long and potentially complex consumer health questions.
Relevant approaches should be able to generate a condensed question expressing the minimum information required to find correct answers to the original question .
2) Summarization of Multiple Answers
Different answers can bring complementary perspectives that are likely to benefit the users of QA systems. The goal of this task is to promote the development of multi-answer summarization approaches that could solve simultaneously the aggregation and summarization problems posed by multiple relevant answers to a medical question .
3) Summarization of Radiology Reports
The automatic summarization of radiology reports has several clinical applications such as accelerating the radiology workflow and improving the efficiency of clinical communications.
This task aims to promote the development of clinical summarization models that are able to generate radiology impression statements by summarizing textual findings written by radiologists [7-8].
Task 1: Question Summarization
Training Data: The MeQSum Dataset of consumer health questions and their summaries  could be used for training. Participants can use available external resources, including, but not limited to medical QA datasets and question focus & type recognition datasets. For instance, the CHQs Dataset  contains additional annotations (e.g. medical entities, question focus, question type, keywords) of the MeQSum questions.
Validation and Test Sets: Consist of consumer health questions received by the U.S. National Library of Medicine (NLM) in December 2020 and their associated summaries, manually created by medical experts.
The validation set is available on the MEDIQA Github project.
Task 2: Multi-Answer Summarization
Validation and Test Sets: The original answers are generated by the medical QA system CHiQA  which searches for answers from only trustworthy medical information sources . The summaries are manually created by medical experts.
The validation set is available on the MEDIQA Github project.
Task 3: Radiology Report Summarization
Training Data: A subset from the MIMIC-CXR Dataset [13,14] could be used for training. Instructions and scripts to download this training set are described here: https://github.com/abachaa/MEDIQA2021/tree/main/Task3.
Participants can use available external resources. But, please note that the rest of the MIMIC-CXR reports as well as the Indiana dataset should not be used for training.
Validation set: A subset from the MIMIC-CXR and Indiana datasets, available on the MEDIQA Github project.
The registration & data usage agreement form is available under the Resources section of the AIcrowd projects.
To register, you need to complete, sign, and upload the form. When approved, you will be able to download the official test sets and to submit your runs on the AIcrowd submission systems.
Submission & Evaluation
Submission Format for the three tasks:
ID [tab] Summary
The summary must fit in one line (no line breaks)
Each team is allowed to submit 10 runs for each task.
ROUGE  will be used as the main metric to rank the participating teams , but we will also use several evaluation metrics more adapted to each task such as HOLMS  and CheXbert .
 "On the Role of Question Summarization and Information Source Restriction in Consumer Health Question Answering". Asma Ben Abacha & Dina Demner-Fushman. AMIA 2019 Informatics Summit.
 "Semantic Annotation of Consumer Health Questions". Halil Kilicoglu, Asma Ben Abacha, Yassine Mrabet, Sonya E Shooshan, Laritza Rodriguez, Kate Masterton & Dina Demner-Fushman. BMC Bioinformatics, 2018. CHQs Dataset
 "Question-Driven Summarization of Answers to Consumer Health Questions". Max E. Savery, Asma Ben Abacha, Soumya Gayen & Dina Demner-Fushman. Scientific Data, Nature, 2020. MEDIQA-AnS Dataset.
 "Consumer health information and question answering: helping consumers find answers to their health-related information needs". Dina Demner-Fushman, Yassine Mrabet & Asma Ben Abacha. JAMIA 2020.
 "A Question-Entailment Approach to Question Answering". Asma Ben Abacha & Dina Demner-Fushman. BMC Bioinformatics, 2019. MedQuAD Dataset.
 "Learning to Summarize Radiology Findings". Yuhao Zhang, Daisy Yi Ding, Tianpei Qian, Christopher D. Manning & Curtis P. Langlotz. EMNLP-LOUHI 2020.
 "Optimizing the Factual Correctness of a Summary: A Study of Summarizing Radiology Reports". Yuhao Zhang, Derek Merck, Emily Bao Tsai, Christopher D. Manning & Curtis P. Langlotz. ACL 2020.
 "ROUGE: A Package for Automatic Evaluation of Summaries". Chin-Yew Lin. ACL 2004.
 "Re-evaluating Evaluation in Text Summarization". Manik Bhandari, Pranav Gour, Atabak Ashfaq, Pengfei Liu & Graham Neubig. EMNLP 2020.
 " HOLMS: Alternative Summary Evaluation with Large Language Models". Yassine Mrabet & Dina Demner-Fushman. COLING 2020.
 "CheXbert: Combining Automatic Labelers and Expert Annotations for Accurate Radiology Report Labeling Using BERT". Akshay Smit, Saahil Jain, Pranav Rajpurkar, Anuj Pareek, Andrew Y. Ng & Matthew P. Lungren. EMNLP 2020.
 "MIMIC-CXR Database (version 2.0.0)". Johnson, A., Pollard, T., Mark, R., Berkowitz, S., & Horng, S. PhysioNet. 2019.
 "MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports". Johnson, A.E.W., Pollard, T.J., Berkowitz, S.J. et al. Sci Data 6, 317. 2019.