Evaluation

Evaluation

The same strategy is adopted for the evaluation of the results of both Tasks.

Systems will be evaluated using standard evaluation metrics, including precision, recall, and F1-score. The submissions will be ranked by F1-score. The metrics will be computed as follows:

Scoring program

The evaluation scripts are available below:

Evaluation data

During the 'development' phase, the prediction files submitted by participants to the task will be evaluated against the gold dev set. 

During the 'evaluation' phase, the prediction files submitted by participants to the task will be evaluated against the gold test set.