Task B Run Submission

Run File

The participating team is required to submit one file in a specific format. We call it a "run file" or a "run" in short. The run file should include the answers to all the questions (of the dev or the test sets) that do have answers, and abstain from providing answers to questions that do not have answers in the Holy Qur'an (i.e., nor in the accompanying Qur'anic passage). We refer to such a question as a "zero-answer" question. Each team is allowed to submit 30 runs on the dev set, but up to 3 runs on the test set. Each run typically constitutes the results of a different system or a model. 

The name of each submitted run file should follow the below naming format

<TeamID_RunID.json>

such that:

For example, bigIR_run01.json.

Format of the Run File 

The expected run file is in JSON format. It has a list of passage-question ids (pq_id) along with their respective ranked lists of returned answers. For each passage-question pair, the system can either abstain from providing an answer (if it thinks there is no answer for such a question) or return up to 10 predicted answers, along with their ranks, their estimated scores, and their start/end token positions in the accompanying passage. Only the ranks and the start/end token positions are used in the evaluation (not the estimated scores). 

For the zero-answer questions, the list of retrieved answers will just be empty, as shown in the example below.

The run file format is shown below for a sample of three questions (one of which is a zero-answer question).

{    "38:41-44_105": [        {            "answer": "أيوب",            "rank": 1,            "score": 0.9586813087043423,"strt_token_indx": 2,"end_token_indx": 2        },        {            "answer": "إنه أواب",            "rank": 2,            "score": 0.014768138560114058,"strt_token_indx": 42,"end_token_indx": 43        },        {            "answer": "ولا تحنث إنا وجدناه صابرا نعم العبد إنه أواب",            "rank": 3,            "score": 0.0052241458173706255,"strt_token_indx": 35,"end_token_indx": 43        },        {            "answer": "واذكر عبدنا أيوب",            "rank": 4,            "score": 0.0026888978292958256,"strt_token_indx": 0,"end_token_indx": 2        }    ],    "74:32-48_330": [        {            "answer": "كل نفس بما كسبت رهينة",            "rank": 1,            "score": 0.7335555760226602,"strt_token_indx": 26,"end_token_indx": 30        },        {            "answer": "لمن شاء منكم أن يتقدم أو يتأخر . كل نفس بما كسبت رهينة",            "rank": 2,            "score": 0.19330937303913176,"strt_token_indx": 18,"end_token_indx": 30        },        {            "answer": "لمن شاء منكم أن يتقدم أو يتأخر",            "rank": 3,            "score": 0.07103693247802075,"strt_token_indx": 18,"end_token_indx": 24        }         ],    "28:85-88_322": []    }

Download the Submission Checker Script

The run file submission checker script is released on our main repo

Leaderboard and Submission Site

Having done the registration steps mentioned here, you can submit your run by sticking to the following steps.

How to submit your runs

The following steps should be done by the team leader only as we will approve the team leader only in Codalab.

Number of runs: Please note that in the development phase, you can submit up to 30 runs. However, in the testing phase, you are allowed to submit 3 runs only. In both phases, the best run will be shown on the leaderboard.

Baseline: The run shown in the leaderboard under username watheq9 is a simple baseline that just answers each question by giving the full passage. It is there just as a reference point.