Tasks 3 & 4: Evidence & Factuality
Don't forget to register through CLEF2020 Lab Registration before 26 April 2020, using this link. Otherwise, your submission will NOT be considered!
Task 3: Evidence Retrieval
Definition
Task Definition: Given a check-worthy claim on a specific topic and a set of text snippets extracted from potentially-relevant webpages, return a ranked list of evidence snippets for the claim. Evidence snippets are those snippets that are useful in verifying the given claim. This task will run in Arabic.
Evidence Snippet: An evidence snippet is a text snippet from a Webpage that constitutes evidence supporting or refuting the claim.
Evaluation
This task is evaluated as a ranking task. Ranked list per topic will be evaluated using ranking evaluation measures (MAP, P@5,10…,P@30). Official measure is P@10.
Submission Runs
Each team can submit up to 2 manual and 4 automatic runs as follows:
- For each run, you will have to explicitly indicate if it is “external” (i.e., uses external data) or not.
- Pre-trained models (not labelled for fact checking, e.g., embeddings or word statistics) are not considered external.
- At least one of the runs must be automatic without use of external data.
Submission Format
Submit one separate results file per run. Evidence snippets per topic must be sorted by rank (from rank 1 till n). For each run, use the following format.
The results file should include a ranking of top 100 evidence snippets per claim (tweet). It must include one tab-separated line per tweet formatted as follows:
topicID tweetID rank snippetID score runID
CT20-AR-05 1219151214690041857 1 CT20-AR-05-0003-001 0.77 teamXrun1
CT20-AR-05 1219151214690041857 2 CT20-AR-05-0005-004 0.74 teamXrun1
CT20-AR-05 1219151214690041857 3 CT20-AR-05-0036-002 0.68 teamXrun1
…
Where the score is a number indicating the usefulness of the snippet for fact-checking the tweet, the rank is the rank of the snippet according to its score, and the runID is a unique ID for one of the runs of the team.
Task 4: Claim Verification
Definition
Given a check-worthy claim on a specific topic and a set of potentially-relevant Web pages, predict the veracity of the claim: TRUE or FALSE. This task will run in Arabic.
Evaluation
The task is a classical binary classification task. Evaluation measures are standard: Precision, recall, and F1. Official measure is macro-averaged F1.
Submission Runs
Each team can submit up to 2 manual and 4 automatic runs as follows:
- For each run, you will have to explicitly indicate if it is “external” (i.e., uses external data) or not.
- Pre-trained models (not labelled for fact checking, e.g., embeddings or word statistics) are not considered external.
- At least one of the runs must be automatic without use of external data.
Submission Format
Submit one separate results file per run. For each run, use the following format.
The results file should include one tab-separated line per claim formatted as follows:
topicID tweetID label runID
CT20-AR-05 1218603003755798529 FALSE teamXrun1
CT20-AR-05 1219151214690041857 TRUE teamXrun1
CT20-AR-05 1217636592908689409 FALSE teamXrun1
...
Where the label is one of: [TRUE, FALSE].
Your result file MUST contain predictions for all claims from the respective input file. Otherwise, the scorer will not score this result file.