Task A Dataset

Datasets Description

The dataset for Task A include the Qur'anic passage collection (QPC), the questions of the AyaTEC dataset, and their relevance judgments over the passages of the QPC. The QPC was developed by topically segmenting the 114 Qur'anic chapters of different lengths using the Thematic Holy Qur'an [1], which is a printed edition that clusters the verses of each chapter into topics. This segmentation resulted in a total of 1,266 passages. The figure below exhibits two pages from the Thematic Qur'an visually segmented by color into four themes/topics according to their respective descriptions. 

The AyaTEC questions that are used in this task amount to 251 questions. They are distributed into training (70%), development (10%), and test (20%) datasets. To make the passage retrieval task more realistic (thus challenging), we have included 37 questions (15%) that do not have an answer in the Holy Qur’an. We call them zero-answer questions.  

The query relevance judgments (QRels) dataset is composed of 1,599 gold (answer-bearing) Qur'anic passage-ids considered relevant to each question.  For zero-answer questions, the passage-id will have a value of "-1".  

Two pages from the Thematic Holy Qur'an categorized into different themes by color.

Download the Dataset

You can download the training and dev sets of for Task-A from our main repo. 

We will release the test set on August 14, 2023.

References 

[1] Swar, M. N., Mushaf Al-Tafseel Al-Mawdoo’ee. Damascus: Dar Al-Fajr Al-Islami, 2007. 

How to cite

If you use the data of Task-A in your research, please cite the following references:

Acknowledgments

We would like to thank all the Qur’an specialists who contributed to annotating/rating the question-answer pairs, especially Dr. Ahmad Shukri, Professor of Tafseer and Qur’anic Sciences at Qatar University, for his scholarly advice throughout the annotation process of the answers extracted from the Holy Qur'an.