Evaluation

Tracks

There will be three different tracks for the competition:

Constrained English track - In this track, only the English and Hindi training data provided for this competition can be used to train / fine-tune models, and only the English test set will be used for evaluation. Teams should not use any other data for training / fine-tuning models.
Constrained Hindi track - In this track, only the English and Hindi training data provided for this competition can be used to train / fine-tune models, and only the Hindi test set will be used for evaluation. Teams should not use any other data for training / fine-tuning models.
Unconstrained Mulilingual track - Here the teams are free to use any kind of resources, including external data (which should be described in the report to be submitted) to train / fine-tune models. The Multilingual test set will be used for evaluation.

Note:

Pre-trained Language Models (PLMs) can be used in all tracks. In the constrained tracks, only the given training data can be used to fine-tune PLMs (if so desired by a team).
The methods must be fully automatic, without any human involvement at any stage.

A team may choose to compete in any one or more of the tracks. A team can submit at most two predictions for each track that it participates in (e.g., if a team decides to try two models for a track). It must be clearly specified during submission which prediction is for which track.

Evaluation Metrics

We will treat span extraction as a token classification task, where each token/word in a post should be labeled 1 (if the token is part of a claim) or 0 (if the token is not part of a claim). Thus the prediction for a given post will be a binary vector (consisting of 0s and 1s) of length equal to the number of tokens in the post.

The standard IOU/Jaccard and Macro-F1 scores for the predictions for each post will be calculated. These metric values averaged over all the posts in a test set will be used for evaluating a team.

An evaluation script will be provided, using which the participants will be able to compute the metrics from their models' outputs.

Team Ranking

One winning team will be decided for each track, based on several factors:

1. Performance of the prediction(s) as per the evaluation metrics

2. Novelty of the models used

3. Clarity of the report submitted