TASK1
The performance of each system will be evaluated using two different rankings:
RANKING 1
The performance of each system will be evaluated using a partial scoring scheme.
For each document will be assigned:
0 Points if no correct answers were provided in the three dimensions to be classified;
1/3 Points if only one correct answer is provided by the system among all the dimensions to be classified;
2/3 Points if two correct answers are provided by the system among all the dimensions to be classified;
1 Point if all the answers provided by the system are correct.
The final score is the total sum of the points achieved by the system across all the documents normalized with respect the number of documents in the test set.
RANKING 2
For each document will be assigned:
1 Point if all the answers provided by the system are correct;
0 otherwise.
The final score is the total sum of the points achieved by the system across all the documents normalized with respect the number of documents in the test set.
TASK2
For subtask 2a and subtask 2b the micro-average f-score will be used as scoring function.
4 different rankings will be provided:
Subtask 2a (in-domain);
Subtask 2a (out-domain);
Subtask 2b (in-domain);
Subtask 2b (out-domain).
EVALUATION SCRIPT
Download the evaluation script here