AraHealthQA 2025

Comprehensive Arabic Health Question Answering Shared Task

AraHealthQA 2025

Part of the The Third Arabic Natural Language Processing Conference
(ArabicNLP 2025) Co-located with EMNLP 2025

Date TBD

Suzhou, China

Comprehensive Arabic Health Question Answering Shared Task (AraHealthQA 2025)

(OpenReview of Shared Task to be updated)

(Please make sure to create an OpenReview account using your University email, or it may take up to two weeks to submit your paper)

Introduction and Motivation

Large Language Models (LLMs) have shown substantial potential across a variety of healthcare applications. Despite this progress, their effectiveness in the Arabic medical domain remains significantly underexplored, primarily due to a lack of high-quality, domain-specific datasets and benchmarking efforts. To address this gap, AraHealthQA 2025 introduces a new shared task designed to evaluate and advance the performance of LLMs on Arabic medical question answering tasks. This shared task aims to catalyze research into the development and evaluation of LLMs for Arabic medical applications, with a particular focus on both general health and mental health domains. By providing curated datasets and a structured evaluation framework, AraHealthQA 2025 facilitates the benchmarking of models under realistic, multilingual, and culturally contextualized healthcare scenarios.

Scope of the Shared Task

AraHealthQA 2025 consists of two distinct but complementary tracks:

Track 1: Arabic Mental Health QA (MentalQA)

This track focuses on mental health-related topics such as anxiety, depression, cognitive disorders, therapeutic practices, and stigma reduction. It includes the following sub-tasks:

- Multi-label question categorization: CodaLab link
- Multi-label answer categorization: CodaLab link
- Patient-doctor question answering: CodaLab link

The dataset consists of 350 question-answer pairs for training and development (available here: link), and 150 pairs for testing.

Track 2: General Arabic Health QA (MedArabiQ)

This track covers a wide array of medical domains such as internal medicine, cardiology, pediatrics, and medical education. It includes the following sub-tasks:

- Multiple choice question answering: Codabench link
- Open-ended question answering: Codabench link

This track includes a development set of 700 questions and a test set of 200 multiple-choice questions.

Shared Task Registration Link

Contact

If you have any specific questions about the shared task, feel free to email: hrhuzali@uqu.edu.sa (Track 1 Lead) & farah.shamout@nyu.edu (Track 2 Lead)