Research contribution-centric named entity recognition (NER) in Computer Science
Rule-based Titles Parser
The systems were implemented only on Scholarly Article Titles in Computational Linguistics (CL).
The first system CL-TitleParser parses and types scientific entities from the titles of Computational Linguistics scholarly articles written in English. Specifically, types the entities as one of six concepts: research problem, solution, resource, language, tool, and method.
Code: https://github.com/jd-coderepos/cl-titles-parser/
Publication: Jennifer D’Souza and Sören Auer (2021). Pattern-Based Acquisition of Scientific Entities from Scholarly Article Titles. Ke HR., Lee C.S., Sugiyama K. (eds) Towards Open and Trustworthy Digital Societies. ICADL 2021. Lecture Notes in Computer Science, vol 13133. Springer, Cham. (Pre-print available at https://arxiv.org/abs/2109.00199)
The second system CL-ShortTitles-Parser parses and types phrases from the titles of Computational Linguistics scholarly articles written in English as scientific entities. It types the entities as one of the following seven semantic concepts: research problem, solution, resource, language, tool, method and dataset.
Machine-learning-based Titles and Abstracts Parser
The ORKG CS-NER system is based on a standardized set of seven contribution-centric scholarly entities viz., research problem, solution, resource, language, tool, method, and dataset. It can automatically extract all seven entity types from Computer Science publication titles. Furthermore, it can extract research problem and method entity types from Computer Science publication abstracts. The details of the sequence labeling machine learner can be found in our preprint publication.
D'Souza, Jennifer, and Sören Auer. Computer Science Named Entity Recognition in the Open Research Knowledge Graph. arXiv preprint arXiv:2203.14579 (2022).
Download our dataset: https://github.com/jd-coderepos/contributions-ner-cs
Funding Statement
This work is supported by TIB Leibniz Information Centre for Science and Technology, the EU H2020 ERC project ScienceGRaph (GA ID: 819536)}