Comprehensive Arabic Health Question Answering Shared Task
AraHealthQA 2025
Part of the The Third Arabic Natural Language Processing Conference
(ArabicNLP 2025) Co-located with EMNLP 2025
Date TBD
Suzhou, China
Comprehensive Arabic Health Question Answering Shared Task
AraHealthQA 2025
Part of the The Third Arabic Natural Language Processing Conference
(ArabicNLP 2025) Co-located with EMNLP 2025
Date TBD
Suzhou, China
Comprehensive Arabic Health Question Answering Shared Task (AraHealthQA 2025)
(OpenReview of Shared Task to be updated)
(Please make sure to create an OpenReview account using your University email, or it may take up to two weeks to submit your paper)
Introduction and Motivation
Large Language Models (LLMs) have shown substantial potential across a variety of healthcare applications. Despite this progress, their effectiveness in the Arabic medical domain remains significantly underexplored, primarily due to a lack of high-quality, domain-specific datasets and benchmarking efforts. To address this gap, AraHealthQA 2025 introduces a new shared task designed to evaluate and advance the performance of LLMs on Arabic medical question answering tasks. This shared task aims to catalyze research into the development and evaluation of LLMs for Arabic medical applications, with a particular focus on both general health and mental health domains. By providing curated datasets and a structured evaluation framework, AraHealthQA 2025 facilitates the benchmarking of models under realistic, multilingual, and culturally contextualized healthcare scenarios.
Scope of the Shared Task
AraHealthQA 2025 consists of two distinct but complementary tracks:
Track 1: Arabic Mental Health QA (MentalQA)
This track focuses on mental health-related topics such as anxiety, depression, cognitive disorders, therapeutic practices, and stigma reduction. It includes the following sub-tasks:
Multi-label question categorization: CodaLab link
Multi-label answer categorization: CodaLab link
Patient-doctor question answering: CodaLab link
The dataset consists of 350 question-answer pairs for training and development (available here: link), and 150 pairs for testing.
Track 2: General Arabic Health QA (MedArabiQ)
This track covers a wide array of medical domains such as internal medicine, cardiology, pediatrics, and medical education. It includes the following sub-tasks:
Multiple choice question answering: Codabench link
Open-ended question answering: Codabench link
This track includes a development set of 700 questions and a test set of 200 multiple-choice questions.
If you have any specific questions about the shared task, feel free to email: hrhuzali@uqu.edu.sa (Track 1 Lead) & farah.shamout@nyu.edu (Track 2 Lead)