Task 3: Fake News Detection

Don't forget to register through CLEF2022 Lab Registration before 22 April 2022, using this link. Otherwise, your submission will NOT be considered!

Task 3: Fake News Detection

Definition

Task Definition: Given the text of a news article, determine whether the main claim made in the article is true, partially true, false, or other (e.g., claims in dispute). This task is offered as a mono-lingual task in English and as cross-lingual task for English and German (English training data, German test data). The idea of the latter is to use the English data and the concept of transfer learning to build a classification model for the German language as well.

The training data contains about 1,300 articles in English (title and text) with the respective label (true, partially true, false, or other). Given the text of a news article, determine whether the main claim made in the article is true, partially true, false, or other. Our definitions for the categories are as follows:

  • False - The main claim made in an article is untrue.

  • Partially False - The main claim of an article is a mixture of true and false information. The article contains partially true and partially false information but cannot be considered as 100% true. It includes all articles in categories like partially false, partially true, mostly true, miscaptioned, misleading etc., as defined by different fact-checking services.

  • True - This rating indicates that the primary elements of the main claim are demonstrably true.

  • Other- An article that cannot be categorised as true, false, or partially false due to lack of evidence about its claims. This category includes articles in dispute and unproven articles.

File Format

These files contain an id, the title of the article and the text of the article. The training data contains a fourth column with the label. The participants need to submit a tsv file containing the id and the label.

Evaluation

This task is evaluated as a classification task. We will use the F1-macro measure for the ranking of teams.

The evaluation script can be found here.

Datasets

We are sharing the dataset only for the context of our task which avoids its use for any commercial purpose. The training data is available via zenodo. Remember to sign the data sharing agreement, otherwise you won't get acces to the data.


The test data with gold labels is now available on zenodo as well.

Data set creation

For the task, the sites of several fact checking sites were analysed. For claims checked, the original article was identified manually. The text of these published fake news documents was then added to the collection with the decision label from the fact checking sites. The labels were aggregated because different sites use different labels. (see following figure)

Submission Guidelines

  • Make sure that you create one account for each team, and submit it through one account only.

  • We will keep the leaderboard private till the end of the submission period, hence, results will not be available upon submission. All results will be available after the evaluation period.

  • You are allowed to submit max 200 submissions per day for each subtask.

  • The last file submitted to the leaderboard will be considered as the final submission.

  • It is mandatory for every team to fill out this survey: https://www.surveymonkey.com/r/LP2F7LC

  • The output file has to have a `.tsv` extension; otherwise, you will get an error on the leaderboard.

  • You have to zip the tsv, `zip submission.zip path_to_tsv_file.tsv` and submit it through the codalab page