Evaluation

Metrics

We evaluate each task setting independently by comparing the results provided by participants to the gold standard annotations of the dataset.

For each setting, we compute Accuracy, Precision, Recall and F-Score.

Baseline

The baseline score will be computed by running a Linear Support Vector Machine (Linear SVM) that takes as input 1 if concept A/B appears in the B/A Wiki page, 0 otherwise. For instance, if we consider the concept pair (Quadrilatero, Segmento) and "segmento" appears in "Quadrilatero" Wiki page but not the opposite, the baseline will takes as input I={0,1}.

Submission Format

For each setting, participants should submit an output file with the predicted labels of the pairs (concept_A, concept_B) in the test set (preserve the order).

The predicted label can assume this two values:

  • 1 if concept_B is prerequisite of concept_A;

  • 0 otherwise.