Evaluation

Tracks

There will be three different tracks for the competition:

Note: 


A team may choose to compete in any one or more of the tracks. A team can submit at most two predictions for each track that it participates in (e.g., if a team decides to try two models for a track).  It must be clearly specified during submission which prediction is for which track. 

Evaluation Metrics 

We will treat span extraction as a token classification task, where each token/word in a post should be labeled 1 (if the token is part of a claim) or 0 (if the token is not part of a claim). Thus the prediction for a given post will be a binary vector (consisting of 0s and 1s) of length equal to the number of tokens in the post. 

The standard IOU/Jaccard and Macro-F1 scores for the predictions for each post will be calculated. These metric values averaged over all the posts in a test set will be used for evaluating a team. 

An evaluation script will be provided, using which the participants will be able to compute the metrics from their models' outputs.

Team Ranking 

One winning team will be decided for each track, based on several factors:

1. Performance of the prediction(s) as per the evaluation metrics

2. Novelty of the models used

3. Clarity of the report submitted