Award winners are announced!

Task B: Reading Comprehension

For the second version of our shared task, we propose two sub-tasks; Task A is a Qur'anic passage retrieval (PR) task, while Task B is a machine reading comprehension (RC) task. We describe and formally define Task B below.

Task Definition

Evaluation Measures

Registration

Dataset

Download the Evaluation Script

Run Submission

Task Definition

The task is defined as follows: Given a Qur'anic passage that consists of consecutive verses in a specific Surah of the Holy Qur'an, and a free-text question posed in MSA over that passage, a system is required to extract all answers to that question that are stated in the given passage (rather than any answer as in Qur'an QA 2022). Each answer must be a span of text extracted from the given passage. The question can be a factoid or non-factoid question. An example is shown below.

To make the task more realistic (thus challenging), some questions may not have an answer in the given passage. In such cases, the ideal system should return no answers; otherwise, it returns a ranked list of up to 10 answer spans.

Qur’anic Passage (74:32-48) الفقرة القرآنية

كَلَّا وَٱلْقَمَرِ. وَٱلَّيْلِ إِذْ أَدْبَرَ. وَٱلصُّبْحِ إِذَآ أَسْفَرَ. إِنَّهَا لَإِحْدَى ٱلْكُبَرِ. نَذِيرًا لِّلْبَشَرِ. لِمَن شَآءَ مِنكُمْ أَن يَتَقَدَّمَ أَوْ يَتَأَخَّرَ. كُلُّ نَفْسٍۭ بِمَا كَسَبَتْ رَهِينَةٌ. إِلَّآ أَصْحَٰبَ ٱلْيَمِينِ. فِى جَنَّٰتٍ يَتَسَآءَلُونَ. عَنِ ٱلْمُجْرِمِينَ. مَا سَلَكَكُمْ فِى سَقَرَ. قَالُوا۟ لَمْ نَكُ مِنَ ٱلْمُصَلِّينَ. وَلَمْ نَكُ نُطْعِمُ ٱلْمِسْكِينَ. وَكُنَّا نَخُوضُ مَعَ ٱلْخَآئِضِينَ. وَكُنَّا نُكَذِّبُ بِيَوْمِ ٱلدِّينِ. حَتَّىٰٓ أَتَىٰنَا ٱلْيَقِينُ. فَمَا تَنفَعُهُمْ شَفَٰعَةُ ٱلشَّٰفِعِينَ.

السؤال/ Question: ما هي الدلائل التي تشير بأن الانسان مخير؟

الإجابات الذهبية / Gold Answers:

لِمَن شَآءَ مِنكُمْ أَن يَتَقَدَّمَ أَوْ يَتَأَخَّرَ
كُلُّ نَفْسٍۭ بِمَا كَسَبَتْ رَهِينَةٌ

Evaluation Measures

We will use partial Average Precision (pAP) [1] as the main evaluation measure. It is a rank-based measure that integrates partial matching to give credit to a QA system that may retrieve an answer that is not necessarily at the first rank and/or partially (i.e., not exactly) match one of the gold answers. Moreover, pAP can be used in evaluating questions that may have one or more answers in the accompanying passage. This makes pAP more suitable to the RC task of Qur'an QA 2023 than partial Reciprocal Rank (pRR) [2], which was the main evaluation measure for Qur'an QA 2022, because participating systems in the latter task were only required to return any answer to a given question even if it has more than one answer in the accompanying passage.

Similar to the PR task, the no-answer cases will be handled simply by giving full credit to ``no answers'' system output and zero otherwise.

To get an overall evaluation score, the measure is averaged over all questions.

Registration

Detailed information for registering in the task is here.

Dataset

Detailed information for the dataset format and also download is here.

Download the Evaluation Script

The evaluation script is released on our main repo.

Run Submission

Detailed information for formatting and submitting your runs is here.

Page updated

Google Sites

Report abuse