Automatic extraction of Task, Dataset, and Metric from Scholarly Articles for building Benchmarks

About

ORKG Benchmarks organize the state-of-the-art empirical research in terms of research problems (e.g., Named entity recognition) addressed and are powered in part by automated information extraction within a human-in-the-loop curation model. A Benchmark is essentially composed of a Research problem or Task that is addressed, an empirical dataset which encapsulates the research problem on which performances are evaluated, a method or model proposed for the task or concretely the dataset, an evaluation performance metric of the model proposed, and a score of the model based on the metric. A concrete example of a Benchmark on the Text Summarization research problem/task, on the Gigaword benchmark by various models can be found at this page https://www.orkg.org/orkg/benchmark/R124737/problem/R124682.


Best paper at ICADL 2021

This study presented a comprehensive approach for generating Benchmarks for knowledge-graph-based scholarly information organization. Specifically, the problem of automated Benchmark construction was investigated using state-of-the-art transformer models, viz. BERT, SciBERT, and XLNet. Our analysis reveals an optimal approach that significantly outperformed existing baselines for the task with evaluation scores above 90% in F1. This, in turn, offers new state-of-the-art results for benchmark extraction. As a result, a vast share of empirical AI research can be organized in the next-generation digital libraries as knowledge graphs. For more information, please see our paper.

Salomon Kabongo, Jennifer D’Souza, and Sören Auer (2021). Automated Mining of Leaderboards for Empirical AI Research. Ke HR., Lee C.S., Sugiyama K. (eds) Towards Open and Trustworthy Digital Societies. ICADL 2021. Lecture Notes in Computer Science, vol 13133. Springer, Cham. (Pre-print available at https://arxiv.org/abs/2109.13089)


A pictorial depiction of the overall model architecture described in the paper is shown below.

Funding Statement

This work was co-funded by the Federal Ministry of Education and Research (BMBF) of Germany for the project LeibnizKILabor (grant no. 01DD20003) and by the European Research Council for the project ScienceGRAPH (Grant agreement ID: 819536).