Evaluation
Task 1
The initial gold-standard for the tasks is generated using human annotation. We also plan to employ a pooling mechanism, i.e., by manually checking the top-ranked tweets of all runs submitted to the track (as commonly done in TREC tracks).
Standard IR measures such as Precision, Recall, MAP, and F-score will be used to evaluate the runs. In Task1, higher credit will be given to runs that identify more number of claim / fact-checkable tweets.
Task 2
Output Format
Each run submitted by a team should have one CSV file containing two columns - the tweetID of a tweet and the predicted class by your classifier. The first few rows of a sample output file are given below:
id,pred
1325682517148569600,AntiVax
1325768441370800128,Neutral
1325770677580918785,Neutral
1325770986571096064,ProVax
...
Evaluation
The participating teams will be ranked based on their performance on the test dataset. The overall accuracy and the macro-F1 score on the 3 classes will be used as evaluation metrics.