Task B Dataset
Dataset Format
Our dataset for Task B is the QRCD (Qur'anic Reading Comprehension Dataset) v1.2. QRCD_v1.2 (the current version we are releasing) is composed of the original 1,093 question-passage (QP) pairs of QRCD_v1.1, in addition to 469 new QP pairs, 407 of which are introduced for evaluating the systems in the MRC task. The total 1,562 QP pairs are coupled with their extracted answers to constitute 1,889 question-passage-answer triplets. To make the reading comprehension task more realistic (thus challenging), we have included in this version of the QRCD dataset questions that do not have an answer in the Holy Qur’an. We call them zero-answer questions. Overall, QRCD_v1.2 includes a total of 76 QP pairs (about 5%) for zero-answer questions. Including zero-answer questions is the main difference between QRCD_v1.1 and QRCD_v1.2 (other than their difference in size).
QRCD is a JSON Lines (JSONL) file; each line is a JSON object that comprises a question-passage pair, along with its answers extracted from the accompanying passage. The dataset adopts the format shown below. The sample below has three JSON objects, one of which is a *zero-answer* question.
}
Download the Dataset
You can download the training and dev sets of QRCD from our main repo.
We will release the test set on August 14, 2023.
Download the Reader Script
A reader script for QRCD is released on our main repo.
How to cite
If you use the QRCD dataset in your research, please cite the following references:
Rana Malhas and Tamer Elsayed. Arabic Machine Reading Comprehension on the Holy Qur’an using CL-AraBERT. Information Processing & Management, 59(6), p.103068, 2022.
Rana Malhas and Tamer Elsayed. AyaTEC: Building a Reusable Verse-Based Test Collection for Arabic Question Answering on the Holy Qur’an. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), 19(6), pp.1-21, 2020.
Acknowledgments
We would like to thank all the Qur’an specialists who contributed to annotating/rating the question-answer pairs, especially Dr. Ahmad Shukri, Professor of Tafseer and Qur’anic Sciences at Qatar University, for his scholarly advice throughout the annotation process of the answers extracted from the Holy Qur'an.