Scientific relation extraction approaches

About

The end-to-end task of lifting knowledge graphs from scholarly articles involves a cumulative setup of identifying entities, relations, and as a supplementary task constructing coreference clusters. In this project, we specifically tackled the task of relation identification and classification. Our systems handled nearly six different relation classes. E.g., Usage, Compare, Part-Whole, Result, Model-Feature, and Topic in the SemEval-18 corpus; or similarly, Used-for, Hyponym-of, Compare, Feature-of, Part-of, Evaluate-for, and Conjunction in the SciERC corpus.


Our Work at The 83rd Annual Meeting of the Association for Information Science and Technology (ASIS&T)

Since the cost of annotating datasets is high, in this work, we designed a hybrid approach to extract scientific concept relations from scholarly publications which: (a) utilized syntactic rules as a form of distant supervision to link related scientific term pairs; and (b) leveraged a supervized classifier to identify the relation type per pair. Our system targeted a high-precision performance objective as opposed to high recall, aiming to reduce the noisy results albeit at the cost of extracting fewer relations when building scholarly knowledge graphs over massive-scale publications. More information can be found in our paper.

Ming Jiang, Jennifer D'Souza, Sören Auer, and Stephen J. Downie. “Targeting precision: A hybrid scientific relation extraction pipeline for improved scholarly knowledge organization.” In: 83rd Annual Meeting of the Association for Information Science and Technology (ASIS&T), 2020.


Evaluating BERT-based models for Scientific Relation Classification

With the introduction of the BERT transformer language model, transformer-based language models pre-trained on large corpora have been popularly explored for automatic scientific relation classification. Despite the remarkable contributions that have been made, many of these methods were evaluated under different scenarios, which limits their comparability. To address this shortcoming, we present a thorough empirical evaluation of 8 BERT-based classification models by focusing on two key factors: 1) BERT model variants, and 2) classification strategies. More details about the model performances and our recommendations of their applications in practical systems is given in our papers.

Ming Jiang, Jennifer D’Souza, Sören Auer, and J. Stephen Downie (2021). Evaluating BERT-based scientific relation classifiers for scholarly knowledge graph construction on digital library collections. In International Journal on Digital Libraries. Springer Science and Business Media LLC.

Ming Jiang, Jennifer D’Souza, Sören Auer, and Stephen J. Downie (2020). Improving Scholarly Knowledge Representation: Evaluating BERT-Based Models for Scientific Relation Classification. In: Ishita E., Pang N.L.S., Zhou L. (eds) Digital Libraries at Times of Massive Societal Transition. ICADL 2020. Lecture Notes in Computer Science, vol 12504. Springer, Cham. (Pre-print available at https://arxiv.org/abs/2004.06153) Best Student Paper Runner-up