NLP Contribution Graph: Structuring NLP Contributions in the Open Research Knowledge Graph

About

NLPContributionGraph is a novel scheme to formalize the annotation of NLP research contributions with the goal to integrate them in a knowledge graph, specifically the Open Research Knowledge Graph. The schema itself was first instantiated on a dataset of the full-text of scholarly articles, which later was released as a SemEval 2021 community-wide shared task.

Early Developments

The annotation scheme was conceptualized via a small-scale dataset annotation exercise. The structured contribution annotations were defined as: 1. Contribution sentences: a set of sentences about the contribution in the article; 2. Scientific terms and relations: a set of scientific terms and relational cue phrases extracted from the contribution sentences; and 3. Triples: semantic statements that pair scientific terms with a relation, modeled toward subject-predicate-object RDF statements for KG building. The Triples are organized under three (mandatory) or more information units (viz., ResearchProblem, Approach, Model, Code, Dataset, ExperimentalSetup, Hyperparameters, Baselines, Results, Tasks, Experiments, and AblationAnalysis).

Details of the preliminary annotation exercise is described in the following paper.

Jennifer D'Souza and Sören Auer (2020). NLPContributions: An Annotation Scheme for Machine Reading of Scholarly Contributions in Natural Language Processing Literature. In Proceedings of the 1st Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents (EEKE 2020) co-located with the ACM/IEEE Joint Conference on Digital Libraries in 2020 (JCDL 2020), Virtual Event, China, August 1.

The annotation scheme was revisited for intra-annotation agreement of the decided scheme and finally revised. Details are in the following paper.

Jennifer D’Souza and Sören Auer (2021). Sentence, Phrase, and Triple Annotations to Build a Knowledge Graph of Natural Language Processing Contributions—A Trial Dataset. Journal of Data and Information Science.

Later as a SemEval 2021 Shared Task

The SemEval-2021 Shared Task NLPContributionGraph (a.k.a. ‘the NCG task’) tasked participants to develop automated systems that structure contributions from NLP scholarly articles in the English language. Being the first-of-its-kind in the SemEval series, the task released structured data from NLP scholarly articles at three levels of information granularity, i.e. at sentence-level, phrase-level, and phrases organized as triples toward Knowledge Graph (KG) building.

Task website: https://ncg-task.github.io/

Shared Task Competition website: https://competitions.codalab.org/competitions/25680

Download the Shared Task Dataset: https://zenodo.org/record/4737071

Relevant Publication.

Jennifer D’Souza, Sören Auer, and Ted Pedersen (2021). SemEval-2021 Task 11: NLPContributionGraph - Structuring Scholarly NLP Contributions for a Research Knowledge Graph. Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021).

A participating system in our Shared Task won the Best System Paper Award in SemEval 2021 Overall. Take a look at their paper!

Liu, Haoyang, M. Janina Sarol, and Halil Kilicoglu. UIUC_BioNLP at SemEval-2021 Task 11: A Cascade of Neural Models for Structuring Scholarly NLP Contributions. Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021). 2021.

Funding Statement

This work was co-funded by the European Research Council for the project ScienceGRAPH (Grant agreement ID: 819536) and by the TIB Leibniz Information Centre for Science and Technology.