Evaluation

Evaluation Plan

The task involves multi-class and multi-label classification. For each task, participating teams will submit runs containing the following for each datapoint in the test set - the ID of a tweet and the predicted class(es) by the model designed by the teams. Each participating team can submit up to 3 runs, e.g, from models with different hyperparameters.

The submitted runs will be ranked based on their performances on the test dataset. The standard classification metric of Macro-F1 score on the 12 different classes will be used for evaluation.

Use of other data for training classifiers: Participating teams are free to use other attributes of the tweets (apart from the text) if they want. Specifically, participants are free to crawl the tweets with the tweet-IDs (which we will provide) using the Twitter API, and then use other features such as the user-profiles. Participating teams are also allowed to use other publicly available datasets for training their models. If a team uses attributes other than the tweet-text or additional datasets, this should be clearly stated in the working notes submitted by that team.