When a model quotes scripture, is it telling the truth?
IslamicEval 2026 is the second edition of the shared task, extending IslamicEval 2025. When people ask large language models religious questions in Arabic, the models often answer with citations from the Qur'an and Hadith that are misquoted, altered, misattributed, or fabricated outright. Because a fabricated verse can read as entirely natural, the errors are easy to produce and hard to catch — and in a domain where authenticity is paramount, the consequences are serious.
Building on the 2025 pipeline, this edition takes a finer-grained approach. It breaks each citation into four fragment types — Ayah, Hadith matn, isnad, and claimed source — for a more precise view of where hallucination occurs, and adds a dedicated relevance subtask. Each subtask is independent and ships with its own dataset, so a team may enter any one of them on its own.
Participating systems are limited to models of 13B parameters or fewer.