Welcome to IslamicEval 2026
Shared Task!
Fine-grained Hallucination Detection
in Arabic Islamic Content

@ArabicNLP 2026, co-located with EMNLP 2026

Budapest, Hungary, October 2026

Finding, verifying, repairing, and assessing the relevance
of Qur'an and Hadith citations in LLM responses

About the Task

When a model quotes scripture, is it telling the truth?

IslamicEval 2026 is the second edition of the shared task, extending IslamicEval 2025. When people ask large language models religious questions in Arabic, the models often answer with citations from the Qur'an and Hadith that are misquoted, altered, misattributed, or fabricated outright. Because a fabricated verse can read as entirely natural, the errors are easy to produce and hard to catch — and in a domain where authenticity is paramount, the consequences are serious.

Building on the 2025 pipeline, this edition takes a finer-grained approach. It breaks each citation into four fragment types — Ayah, Hadith matn, isnad, and claimed source — for a more precise view of where hallucination occurs, and adds a dedicated relevance subtask. Each subtask is independent and ships with its own dataset, so a team may enter any one of them on its own.

Participating systems are limited to models of 13B parameters or fewer.

Subtasks Overview

Subtask 1 Span Detection: Find the spans of claimed fragments and label each as Ayah, Hadith matn, isnad, or claimed source.
Subtask 2 Hallucination Identification: Label each fragment correct or incorrect. Isnad and claimed source are judged only when their corresponding Ayahs and matns are correct, N/A otherwise.
Subtask 3 Hallucination Correction: Provide the canonical text for incorrect Ayahs and Hadith matns. Covers Ayahs and Hadiths two types only.
Subtask 4 Answer Relevance: Decide whether a citation is actually relevant to answering the question it was given for.

Example

In the example shown, the user is asking about the meaning of tawḥīd and its three types. We show the question along with the LLM response. The same LLM response is used across the subtasks 1-3. The LLM response is citing Qur'anic verses and hadiths. There are two of its claimed citations — one Ayah (with its claimed source) and one Hadith (with its isnad and matn) — shown in context.

In this example, the Ayah has been altered, so it is incorrect; everything attached to an incorrect verse or matn is judged N/A. More details in each subtask page.

Registration

The registration form is available here. Registration is required for participation.

Dataset

The development set for Subtasks will be released soon.

Timeline

12 July 2026 01 July 2026: Release of training/development data and evaluation scripts
20 July 2026: Registration deadline (Register HERE)
23 July 2026: Test set release
01 August 2026: System submission · final evaluation
08 August 2026: System description papers due

Organizers

Rahaf Alharbi, University of Edinburgh
Abdulelah Alturki, University Edinburgh
Rana Malhas, Qatar University
Watheq Mansour, University of Queensland
Hamdy Mubarak, QCRI, HBKU
Kareem Darwish, QCRI, HBKU
Tamer Elsayed, Qatar University
Walid Magdy, University of Edinburgh

Welcome to IslamicEval 2026 Shared Task!Fine-grained Hallucination Detection in Arabic Islamic Content

Finding, verifying, repairing, and assessing the relevance of Qur'an and Hadith citations in LLM responses