Datasets & Tools

Arabic Datasets

All Arabic datasets will be added gradually to the repository here.

Task 1:

- CT20-AR-Train-T1: dataset includes 3 training topics, 1,500 tweets and corresponding check-worthiness labels.
- CT20-AR-Test-T1: dataset includes 12 testing topics, 6,000 tweets and corresponding check-worthiness labels.

Task 3:

- Training can be done using train dataset and test dataset for sub-task C in 2019 edition of the lab.

Task 4:

- Training can be done using train dataset and test dataset for sub-task D in 2019 edition of the lab.

English Datasets

Task 1:

- CT20-EN-Train-T1: dataset includes 1 training topics, 488 tweets and corresponding check-worthiness labels
- CT20-EN-Test-T1: 140 tweets of the same training topic has been released as Test Data.

Task 2:

- CT20-EN-Train-T2: dataset includes 1,003 tweets and corresponding 10,373 verified claims.
- CT20-EN-Test-T2: 200 tweets to be matched against the 10,373 already verified claims released as Test Data

Task 5:

- CT20-EN-Train-T5: dataset includes 50 fact-checked documents - debates, speeches, press conferences, etc.
- CT20-EN-Test-T5: 20 debates has been released as Test Data.

Page updated

Google Sites

Report abuse