Subtask 4: Answer Relevance

Task Definition

Given a user question and an LLM response with its correctly extracted Qur'anic and Hadith citation spans, classify each citation using a binary label: relevant or non-relevant. A citation span is considered Relevant if it directly or indirectly answers the question. Citation spans that are merely topically related but do not answer the question, as well as citation spans that are entirely unrelated, should be classified as Non-relevant. A citation span may be authentic and accurately quoted yet still be Non-relevant if it does not contribute to answering the question.

This binary label is derived from a four-tier annotation rubric — direct answer, indirect answer, topically related with no answer, and non-relevant — developed for the ground-truth annotation of Qur'an and Hadith citations.

Participating systems are limited to models of 13B parameters or fewer.

Example

Evaluation Measures

The submissions will be evaluated using Macro-F1 per question, then averaged across all questions. Qur'an and Hadith are scored separately.

Page updated

Google Sites

Report abuse