We evaluate each task setting independently by comparing the results provided by participants to the gold standard annotations of the dataset.
For each setting, we compute Accuracy, Precision, Recall and F-Score.
The baseline score will be computed by running a Linear Support Vector Machine (Linear SVM) that takes as input 1 if concept A/B appears in the B/A Wiki page, 0 otherwise. For instance, if we consider the concept pair (Quadrilatero, Segmento) and "segmento" appears in "Quadrilatero" Wiki page but not the opposite, the baseline will takes as input I={0,1}.
For each setting, participants should submit an output file with the predicted labels of the pairs (concept_A, concept_B) in the test set (preserve the order).
The predicted label can assume this two values:
1 if concept_B is prerequisite of concept_A;
0 otherwise.