Task 2: Evidence & Factuality

Definitions

Task Definition: Given a claim associated with a set of Web pages P (that constitute the results of Web search in response to using the claim as a search query), identify which of the Web pages (and passages of those Web pages) can be useful in assisting a human who is fact-checking the claim. Finally, judge the claim factuality according to the supporting information in the passages of the Web documents. This task will be run in Arabic.

Definition of usefulness: A page is considered useful for verification if it is relevant with respect to the claim (i.e., on-topic and discussing the claim) and it provides evidence to verify the veracity of the claim. An evidence can be a source, statistics, or quote, etc. However, a particular evidence is considered not valid if (1) the source cannot be verified (e.g., Expressing that "experts say that ..." without mentioning who those experts are), or (2) it is just an opinion of a person/expert instead of an objective analysis. Notice that this is different from stance detection; a page might agree with a claim, but it might still lack evidence to verify it.

Subtasks

Subtask A: Rank the Web pages P based on how useful they are for verifying the target claim. The systems are required to produce a score for each page, based on which the pages would be ranked.
Subtask B: Classify each of the Web pages as "very useful for verification'', "useful for verification'', "not useful" or "not relevant". A page is considered very useful for verification if it is relevant with respect to the claim (i.e., on-topic and discussing the claim) and it provides sufficient evidence to verify the veracity of the claim such that there is no need for another document to be checked for this claim. A page is useful for verification if it is relevant to the claim and provides some valid evidence but it is not sufficient to determine the claim veracity on its own.
Subtask C: Find passages within the useful Web pages that are useful for claim verification.
Subtask D: Find the claim's factuality as "true'' or "false''. The claim is considered true if it is accurate and there is nothing significant missing. A claim is false if it is not accurate. During the evaluation period, this task will run 2 times at different cycles of evaluation:
- Cycle 1: Estimate the claim factuality using the unlabelled provided webpages only
- Cycle 2: Estimate the claim factuality using the useful webpages only

Submission Runs

Each team can submit up to 4 runs per subtask as follows:

For each run, you will have to explicitly indicate if it is “external” (i.e., uses external data) or not.
Pre-trained models (not labelled for fact checking, e.g., embeddings or word statistics) are not considered external.
Maximum of two runs can use external data.

Submission Format

For each subtask, submit one separate results file per run. For each run, use the following format.

SubTask A

The results file contains one tab-separated line per page formatted as follows:

claimID rank pageID score runID

where the score is a number indicating the usefulness of the webpage, the rank is the rank of the page according to its score, and the runID is a unique ID for one of the runs of the team.

For example:

CT19-T2-XXX 1 CT19-T2-XXX-05 0.77 teamXrun1

CT19-T2-XXX 2 CT19-T2-XXX-01 0.74 teamXrun1

CT19-T2-XXX 3 CT19-T2-XXX-07 0.68 teamXrun1

...

Your result file MUST contain scores for all pages from the respective input file. Otherwise, the scorer will not score this result file.

SubTask B

The results file contains one tab-separated line per page formatted as follows:

claimID pageID label runID

where the label is one of: [2: very useful, 1: useful, 0: not useful, -1: not relevant].

For example:

CT19-T2-XXX CT19-T2-XXX-01 0 teamXrun1

CT19-T2-XXX CT19-T2-XXX-03 2 teamXrun1

CT19-T2-XXX CT19-T2-XXX-05 -1 teamXrun1

...

Your result file MUST contain predictions for all pages from the respective input file. Otherwise, the scorer will not score this result file.

SubTask C

The results file contains one tab-separated line per passage from a useful page formatted as follows:

claimID pageID passageID label runID

where the label is one of: [1: useful or very useful, 0: not useful].

For example:

CT19-T2-XXX CT19-T2-XXX-01 CT19-T2-XXX-01-01 0 teamXrun1

CT19-T2-XXX CT19-T2-XXX-01 CT19-T2-XXX-01-02 1 teamXrun1

CT19-T2-XXX CT19-T2-XXX-01 CT19-T2-XXX-01-03 1 teamXrun1

CT19-T2-XXX CT19-T2-XXX-03 CT19-T2-XXX-03-01 0 teamXrun1

...

SubTask D

The results file contains one tab-separated line per claim formatted as follows:

claimID label runID

where the label is one of: [TRUE, FALSE].

For example:

CT19-T2-XXX TRUE teamXrun1

CT19-T2-XXX FALSE teamXrun1

...

Your result file MUST contain predictions for all claims from the respective input file. Otherwise, the scorer will not score this result file.

Google Sites

Report abuse