Subtask 2: Hallucination Identification

Task Definition

Given an LLM response to a user question, for each claimed fragment deduced from Subtask 01, decide whether it is correct or incorrect. Accuracy is anchored on the Ayah and Hadith matn. An isnad or claimed source is judged only when its matn is correct; otherwise, it is N/A and excluded from evaluation.

Participating systems are limited to models of 13B parameters or fewer.

Example

In the example shown above, the user is asking about the meaning of tawḥīd and its three types. We show the question along with the LLM response. In subtask 01, we found that there are two of its claimed citations — one Ayah (with its claimed source) and one Hadith (with its isnad and matn). In the figure below, the Ayah's words have been altered, so it is incorrect, and its attached claimed source (al-Zumar 38) becomes N/A — excluded from evaluation. The Hadith matn is correct, so its isnad is judged on its own and labelled correct.

Evaluation Measures

Accuracy per label is the official measure for this subtask while excluding N/A

Page updated

Google Sites

Report abuse