Evaluation

To evaluate the model performance of the participants (ranking) the F1-score will be used. Consider that for the multi-class and the F1 score will be computed as a macro average (macro), for multi-label track the F1 score will be computed as a sample average (samples) and for the binary track the F1 score will be computed as a macro average (macro). Additionally, the following metrics will be computed: