EVALUATION

Evaluation plan

Participating teams can choose to participate in any one or both tasks.

Both the tasks involve standard multi-class classification. For each task, participating teams will submit runs which will be CSV files containing two columns - the tweetID of a tweet (in the test set) and the predicted class by the model designed by the teams. Each participating team can submit up to 3 runs for a task, e.g., to compare among different models.

For each task, the submitted runs will be ranked based on their performances on the test dataset. Standard metrics such as the overall accuracy and the macro-F1 score on the classes will be used as evaluation metrics.

Use of other data for training classifiers: Participating teams are free to use other attributes of the tweets (apart from the text) if they want. Specifically, participants are free to crawl the tweets with the tweetIDs (which we will provide) using the Twitter API, and then use other features such as the user-profiles. Participating teams are also allowed to use other publicly available datasets for training their models. If a team uses attributes other than the tweet-text or additional datasets, this should be clearly stated in the working notes submitted by that team.

Output Format for Task 1 and Task 2

Each run submitted by a team should have one CSV file containing two columns - the tweetID of a tweet and the predicted class by your classifier. The first few rows of a sample output file are given below:

id,pred

1325682517148569600,AntiVax

1325768441370800128,Neutral

1325770677580918785,Neutral

1325770986571096064,ProVax

...

Evaluation

The participating teams will be ranked based on their performance on the test dataset. The overall accuracy and the macro-F1 score on the classes will be used as evaluation metrics.

Page updated

Google Sites

Report abuse