Textual Inference and Question Entailment in the Medical Domain


The MEDIQA challenge aims to attract further research efforts in Natural Language Inference (NLI), Recognizing Question Entailment (RQE), and their applications in medical Question Answering (QA). This ACL-BioNLP 2019 shared task is motivated by a need to develop relevant methods, techniques and gold standards for inference and entailment in the medical domain and their application to improve domain specific IR and QA systems.


1) NLI: This first task consists in identifying three inference relations between two sentences: Entailment, Neutral and Contradiction [1]

2) RQE: This task focuses on identifying entailment between two questions in the context of QA. We use the following definition of question entailment: "a question A entails a question B if every answer to B is also a complete or partial answer to A" [2]

3) QA: The objective of this task is to filter and improve the ranking of automatically retrieved answers. The input ranks are generated by the medical QA system CHiQA. We highly recommend the reuse of RQE and/or NLI systems (first tasks) in the QA task [3-5]

Paper (Overview of the MEDIQA 2019 Shared Task):

Data & Evaluation

** All datasets and evaluation scripts are available at : [6]

Training sets:

  1. NLI: The MedNLI dataset including 14,049 clinical sentence pairs [1]. Important: Participants will have to obtain access to MIMIC in order to access MedNLI and the test set.
  2. RQE: The RQE collection containing 8,588 medical question pairs [2].
  3. QA: Two sets of medical questions and the associated lists of answers retrieved by the medical QA system CHiQA and reranked manually:

In addition, the MedQuAD dataset of 47k question-answer pairs can be used to retrieve answered questions that are entailed from the original questions [3].

Validation and test sets:

  1. NLI Datasets + Submission on NLI@AICrowd.
  2. RQE Datasets + Submission on RQE@AICrowd.
  3. QA Datasets + Submission on QA@AICrowd.

Evaluation measures: Accuracy for the NLI and RQE tasks. For the QA task: Mean Reciprocal Rank (MRR), Accuracy, Precision, and Spearman's Rank Correlation Coefficient.



Join our mailing list:

Important Dates

  • February 8, 2019: AICrowd projects go public: NLI@AICrowd, RQE@AICrowd & QA@AICrowd.
  • February 28, 2019: Release of the RQE validation set, run submission open.
  • March 19, 2019: Release of the QA validation set.
  • April 10, 2019: Run submission open on the QA validation set.
  • April 15, 2019: Release of the test sets for the 3 tasks.
  • April 30, 2019: Run submission deadline. Participants' results will be available on AIcrowd.
  • May 15, 2019: Paper submission deadline. Submission instructions
  • May 31, 2019: Notification of acceptance.
  • June 6, 2019: Camera-ready copy due --Firm deadline due to ACL schedule.
  • August 1, 2019: BioNLP workshop, ACL 2019, Florence, Italy.


[1] A. Romanov & C. Shivade. Lessons from Natural Language Inference in the Clinical Domain. EMNLP 2018. DATA

[2] A. Ben Abacha & D. Demner-Fushman. Recognizing Question Entailment for Medical Question Answering. AMIA 2016. DATA

[3] A. Ben Abacha & D. Demner-Fushman. A Question-Entailment Approach to Question Answering. arXiv:1901.08079 [cs.CL], January 2019. DATA

[4] S. Harabagiu & A. Hickl. Methods for using textual entailment in open-domain question answering. ACL 2006.

[5] A. Ben Abacha, E. Agichtein, Y. Pinter & D. Demner-Fushman. Overview of the Medical Question Answering Task at TREC 2017 LiveQA. TREC 2017. DATA

[6] A. Ben Abacha, C. Shivade, and D. Demner-Fushman. Overview of the MEDIQA 2019 Shared Task on Textual Inference, Question Entailment and Question Answering. ACL-BioNLP 2019.