The metric that will be used to evaluate entries are Macro-Precision, Macro-Recall, and Macro-F1score from Scikit-learn :
For each sample, predicted label will be divided into four categories TP(True Positive), TN(True Negative), FP(False Positive), FN(False Negative). The Recall, Precision, and F1-score will be computed as follows:
Recall = TP/(TP+FN)
Precision = TP/(TP+FP)
F1-score = (2 * Precision * Recall) / (Precision + Recall)
Macro-Recall = ∑ Recall / N, where N is the number of classes
Macro-Precision = ∑ Precision / N, where N is the number of classes
Macro-F1score = ∑ F1-score/ N, where N is the number of classes