Award Winners are announced!

Dataset

Dataset Format

Our dataset for the task, called QRCD (Qur'anic Reading Comprehension Dataset), is composed of 1,093 tuples of question-passage pairs that are coupled with their extracted answers to constitute 1,337 question-passage-answer triplets. It is split into training (65%), development (10%), and test (25%) sets.

QRCD is a JSON Lines (JSONL) file; each line is a JSON object that comprises a question-passage pair, along with its answers extracted from the accompanying passage. The dataset adopts the format shown below. The sample below has two JSON objects, one for each of the above two questions.

{ "pq_id": "38:41-44_105", "passage": "واذكر عبدنا أيوب إذ نادى ربه أني مسني الشيطان بنصب وعذاب. اركض برجلك هذا مغتسل بارد وشراب. ووهبنا له أهله ومثلهم معهم رحمة منا وذكرى لأولي الألباب. وخذ بيدك ضغثا فاضرب به ولا تحنث إنا وجدناه صابرا نعم العبد إنه أواب.", "surah": 38, "verses": "41-44", "question": "من هو النبي المعروف بالصبر؟", "answers": [ { "text": "أيوب", "start_char": 12 } ]}{ "pq_id": "74:32-48_330", "passage": "كلا والقمر. والليل إذ أدبر. والصبح إذا أسفر. إنها لإحدى الكبر. نذيرا للبشر. لمن شاء منكم أن يتقدم أو يتأخر. كل نفس بما كسبت رهينة. إلا أصحاب اليمين. في جنات يتساءلون. عن المجرمين. ما سلككم في سقر. قالوا لم نك من المصلين. ولم نك نطعم المسكين. وكنا نخوض مع الخائضين. وكنا نكذب بيوم الدين. حتى أتانا اليقين. فما تنفعهم شفاعة الشافعين.", "surah": 74, "verses": "32-48", "question": "ما هي الدلائل التي تشير بأن الانسان مخير؟", "answers": [ { "text": "لمن شاء منكم أن يتقدم أو يتأخر", "start_char": 76 }, { "text": "كل نفس بما كسبت رهينة", "start_char": 108 } ]}

Download the Dataset

You can download the training and dev sets of QRCD from our main repo.

We will release the test set on March 26th 2022.

Download the Reader Script

A reader script for QRCD is released on on our main repo.

How to cite

If you use the QRCD dataset in your research, please cite the following references:

Rana Malhas and Tamer Elsayed. Arabic Machine Reading Comprehension on the Holy Qur’an using CL-AraBERT. Information Processing & Management, 59(6), p.103068, 2022.
Rana Malhas and Tamer Elsayed. Official Repository of Qur’an QA Shared Task. https://gitlab.com/bigirqu/quranqa. February 2022.
Rana Malhas and Tamer Elsayed. AyaTEC: Building a Reusable Verse-Based Test Collection for Arabic Question Answering on the Holy Qur’an. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), 19(6), pp.1-21, 2020.

Acknowledgements

We would like to thank all the Qur’an specialists who contributed to annotating/rating the question-answer pairs, especially Dr. Ahmad Shukri, Professor of Tafseer and Qur’anic Sciences at Qatar University, for his scholarly advice throughout the annotation process of the answers extracted from the Holy Qur'an.

Page updated

Google Sites

Report abuse