Textual Inference and Question Entailment in the Medical Domain


The MEDIQA challenge aims to attract further research efforts in Natural Language Inference (NLI), Recognizing Question Entailment (RQE), and their applications in medical Question Answering (QA). This ACL-BioNLP 2019 shared task is motivated by a need to develop relevant methods, techniques and gold standards for inference and entailment in the medical domain and their application to improve domain specific IR and QA systems.


1) NLI: This first task consists in identifying three inference relations between two sentences: Entailment, Neutral and Contradiction [1]

2) RQE: This task focuses on identifying entailment between two questions in the context of QA. We use the following definition of question entailment: "a question A entails a question B if every answer to B is also a complete or partial answer to A" [2]

3) Using NLI and RQE for QA: We provide lists of candidate answers retrieved by a medical QA system for a set of consumer health questions. This task consists in using NLI and/or RQE systems (first tasks) to enhance the QA system's results (e.g. validate, filter, or re-rank the answers) [3-5]

Data & Evaluation

Training sets:

  1. NLI: The MedNLI dataset including 14,049 clinical sentence pairs [1].
  2. RQE: The RQE collection containing 8,890 medical question pairs [2].
  3. QA: A set of consumer health questions and the associated lists of answers retrieved by the medical QA system CHiQA. (soon)

Validation sets: Three validation sets will be released in February 2019, with (i) labeled sentence pairs (task 1), (ii) labeled question pairs (task 2) and ranked answer lists (task 3), in the same format as the test sets.

Test sets: Three test sets will be available to the participants. Each team is allowed to submit a maximum of 5 runs for each task.

Evaluation measures: Accuracy for the first and second tasks, MAP and MRR for the third task.

Participation: MEDIQA'19 will be hosted on AIcrowd, a platform for open data science challenges (previously crowdAI) that will be launched in mid January 2019.


Join our mailing list:

Important Dates (Tentative)

  • Mid January, 2019: Registration opens on the new AIcrowd platform.
  • Mid February, 2019: Release of the validation sets.
  • April 1, 2019: Release of the test sets for participants.
  • April 15, 2019: Run submission deadline. Participants' results will be available on AIcrowd.
  • May 15, 2019: Paper submission deadline.
  • August 1, 2019: BioNLP workshop, ACL 2019, Florence, Italy.


[1] A. Romanov & C. Shivade. Lessons from Natural Language Inference in the Clinical Domain. EMNLP 2018. DATA

[2] A. Ben Abacha & D. Demner-Fushman. Recognizing Question Entailment for Medical Question Answering. AMIA 2016. DATA

[3] A. Ben Abacha & D. Demner-Fushman. A Question-Entailment Approach to Question Answering. Information Processing and Management Journal 2019 (under review). DATA (soon)

[4] S. Harabagiu & A. Hickl. Methods for using textual entailment in open-domain question answering. ACL 2006.

[5] A. Ben Abacha, E. Agichtein, Y. Pinter & D. Demner-Fushman. Overview of the Medical Question Answering Task at TREC 2017 LiveQA. TREC 2017. DATA