(EBDT-KG) Evaluation Dataset and Methodology for Extracting Application-Specific Taxonomies from the Wikipedia Knowledge Graph
Georgeta Bordea (1), Stefano Faralli (2), Fleur Mougin (1), Paul Buitelaar (3), Gayo Diallo (1)
(1) University of Bordeaux, Inserm, Bordeaux, Population Health Research Center, France
(2) University of Rome Unitelma Sapienza, Italy
(3) Data Science Institute, NUI Galway, Ireland
We present an evaluation framework enabling the comparison against gold/silver standard domain specific taxonomies extracted from the Wikipedia knowledge graph for the domain of DRUGS, FOODS and PLANTS. Our benchmark can be adapted to other domains of interest with minimal efforts.
Dataset
The three domain taxonomies are available in the form of three tsv files:
Interactive view available here.
Additionally we provide the evaluation sheets produced by three domain expert for the assessment of the precision of a sample of removed and survived nodes:
- annotations for the DRUGS taxonomy;
- annotations for the FOODS taxonomy;
- annotations for the PLANTS taxonomy.
The initial noisy taxonomy graph:
More information and resources about the evaluation framework can be found at:
- SemEval 2015 task 17: http://alt.qcri.org/semeval2015/task17/
- SemEval 2016 task 13: http://alt.qcri.org/semeval2016/task13/
License
The resource is publicly available under a CC BY 4.0 Licence.
Contacts
Georgeta Bordea: georgeta.bordea@u-bordeaux.fr
Stefano Faralli: stefano.faralli@unitelmasapienza.it
last update: 3 December 2019