Evaluation

To evaluate the model performance of the participants (ranking) the F1-score will be used. Consider that for the multi-class and the F1 score will be computed as a macro average (macro), for multi-label track the F1 score will be computed as a sample average (samples) and for the binary track the F1 score will be computed as a macro average (macro). Additionally, the following metrics will be computed:

Track 1:
- F1-score (macro-average)
- Precision (macro-average)
- Recall (macro-average)
Track 2:
- F1-score (samples)
- Hamming-Loss
- Exact match ratio
Track 3:
- F1-score (macro-average)
- Precision (macro-average)
- Recall (macro-average)

Page updated

Google Sites

Report abuse