Tasks 3 & 4: Evidence & Factuality

Don't forget to register through CLEF2020 Lab Registration before 26 April 2020, using this link. Otherwise, your submission will NOT be considered!

Task 3: Evidence Retrieval

Definition

Task Definition: Given a check-worthy claim on a specific topic and a set of text snippets extracted from potentially-relevant webpages, return a ranked list of evidence snippets for the claim. Evidence snippets are those snippets that are useful in verifying the given claim. This task will run in Arabic.

Evidence Snippet: An evidence snippet is a text snippet from a Webpage that constitutes evidence supporting or refuting the claim.

Evaluation

This task is evaluated as a ranking task. Ranked list per topic will be evaluated using ranking evaluation measures (MAP, P@5,10…,P@30). Official measure is P@10.

Submission Runs

Each team can submit up to 2 manual and 4 automatic runs as follows:

  • For each run, you will have to explicitly indicate if it is “external” (i.e., uses external data) or not.
  • Pre-trained models (not labelled for fact checking, e.g., embeddings or word statistics) are not considered external.
  • At least one of the runs must be automatic without use of external data.

Submission Format

Submit one separate results file per run. Evidence snippets per topic must be sorted by rank (from rank 1 till n). For each run, use the following format.

The results file should include a ranking of top 100 evidence snippets per claim (tweet). It must include one tab-separated line per tweet formatted as follows:

topicID  tweetID  rank  snippetID  score  runID
CT20-AR-05  1219151214690041857  1  CT20-AR-05-0003-001  0.77  teamXrun1
CT20-AR-05  1219151214690041857  2  CT20-AR-05-0005-004  0.74  teamXrun1
CT20-AR-05  1219151214690041857  3  CT20-AR-05-0036-002  0.68  teamXrun1

Where the score is a number indicating the usefulness of the snippet for fact-checking the tweet, the rank is the rank of the snippet according to its score, and the runID is a unique ID for one of the runs of the team.

Task 4: Claim Verification

Definition

Given a check-worthy claim on a specific topic and a set of potentially-relevant Web pages, predict the veracity of the claim: TRUE or FALSE. This task will run in Arabic.

Evaluation

The task is a classical binary classification task. Evaluation measures are standard: Precision, recall, and F1. Official measure is macro-averaged F1.

Submission Runs

Each team can submit up to 2 manual and 4 automatic runs as follows:

  • For each run, you will have to explicitly indicate if it is “external” (i.e., uses external data) or not.
  • Pre-trained models (not labelled for fact checking, e.g., embeddings or word statistics) are not considered external.
  • At least one of the runs must be automatic without use of external data.

Submission Format

Submit one separate results file per run. For each run, use the following format.

The results file should include one tab-separated line per claim formatted as follows:

topicID  tweetID  label  runID
CT20-AR-05  1218603003755798529  FALSE  teamXrun1
CT20-AR-05  1219151214690041857  TRUE  teamXrun1
CT20-AR-05  1217636592908689409  FALSE  teamXrun1
...

Where the label is one of: [TRUE, FALSE].

Your result file MUST contain predictions for all claims from the respective input file. Otherwise, the scorer will not score this result file.