(EBDT-KG) Evaluation Dataset and Methodology for Extracting Application-Specific Taxonomies from the Wikipedia Knowledge Graph

Georgeta Bordea (1), Stefano Faralli (2), Fleur Mougin (1), Paul Buitelaar (3), Gayo Diallo (1)

(1) University of Bordeaux, Inserm, Bordeaux, Population Health Research Center, France

(2) University of Rome Unitelma Sapienza, Italy

(3) Data Science Institute, NUI Galway, Ireland

We present an evaluation framework enabling the comparison against gold/silver standard domain specific taxonomies extracted from the Wikipedia knowledge graph for the domain of DRUGS, FOODS and PLANTS. Our benchmark can be adapted to other domains of interest with minimal efforts.

Dataset

The three domain taxonomies are available in the form of three tsv files:

Interactive view available here.


Additionally we provide the evaluation sheets produced by three domain expert for the assessment of the precision of a sample of removed and survived nodes:

The initial noisy taxonomy graph:

More information and resources about the evaluation framework can be found at:

License

The resource is publicly available under a CC BY 4.0 Licence.

Contacts

Georgeta Bordea: georgeta.bordea@u-bordeaux.fr

Stefano Faralli: stefano.faralli@unitelmasapienza.it

last update: 3 December 2019