Evaluation
To evaluate the model performance of the participants (ranking) the F1-score will be used. Consider that for the multi-class and the F1 score will be computed as a macro average (macro), for multi-label track the F1 score will be computed as a sample average (samples) and for the binary track the F1 score will be computed as a macro average (macro). Additionally, the following metrics will be computed:
Track 1:
F1-score (macro-average)
Precision (macro-average)
Recall (macro-average)
Track 2:
F1-score (samples)
Hamming-Loss
Exact match ratio
Track 3:
F1-score (macro-average)
Precision (macro-average)
Recall (macro-average)