Given an LLM response to a user question, locate the spans of claimed fragments in the given response and label each by type — Ayah, Hadith matn, isnad, or claimed source — with character indexes. Every citation is treated as claimed, since it may be inaccurate.
Participating systems are limited to models of 13B parameters or fewer.
In the example shown, the user is asking about the meaning of tawḥīd and its three types.
We show the question along with the LLM response. The LLM response is citing Qur'anic verses and hadiths. There are two of its claimed citations — one Ayah (with its claimed source) and one Hadith (with its isnad and matn) — shown in context. Fragments detected in the example above are as follows:
Macro-F1 at character level · 5 classes: Ayah / matn / isnad / claimed source / neither