Evaluation

The same strategy is adopted for the evaluation of the results of both Tasks.

Systems will be evaluated using standard evaluation metrics, including precision, recall, and F1-score. The submissions will be ranked by F1-score. The metrics will be computed as follows:

Precision
Recall
Macro F1-score

Scoring program

The evaluation scripts are available below:

Task1
Task2

Evaluation data

During the 'development' phase, the prediction files submitted by participants to the task will be evaluated against the gold dev set.

During the 'evaluation' phase, the prediction files submitted by participants to the task will be evaluated against the gold test set.