Award Winners are announced!
Our dataset for the task, called QRCD (Qur'anic Reading Comprehension Dataset), is composed of 1,093 tuples of question-passage pairs that are coupled with their extracted answers to constitute 1,337 question-passage-answer triplets. It is split into training (65%), development (10%), and test (25%) sets.
QRCD is a JSON Lines (JSONL) file; each line is a JSON object that comprises a question-passage pair, along with its answers extracted from the accompanying passage. The dataset adopts the format shown below. The sample below has two JSON objects, one for each of the above two questions.
You can download the training and dev sets of QRCD from our main repo.
We will release the test set on March 26th 2022.
A reader script for QRCD is released on on our main repo.
If you use the QRCD dataset in your research, please cite the following references:
Rana Malhas and Tamer Elsayed. Arabic Machine Reading Comprehension on the Holy Qur’an using CL-AraBERT. Information Processing & Management, 59(6), p.103068, 2022.
Rana Malhas and Tamer Elsayed. Official Repository of Qur’an QA Shared Task. https://gitlab.com/bigirqu/quranqa. February 2022.
Rana Malhas and Tamer Elsayed. AyaTEC: Building a Reusable Verse-Based Test Collection for Arabic Question Answering on the Holy Qur’an. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), 19(6), pp.1-21, 2020.
We would like to thank all the Qur’an specialists who contributed to annotating/rating the question-answer pairs, especially Dr. Ahmad Shukri, Professor of Tafseer and Qur’anic Sciences at Qatar University, for his scholarly advice throughout the annotation process of the answers extracted from the Holy Qur'an.