SOTA?

Tracking the State-of-the-Art in Scholarly Publications

Background

In Artificial Intelligence (AI), a common research objective is the development of new models that can report state-of-the-art (SOTA) performance. The reporting usually comprises four integral elements: Task, Dataset, Metric, and Score (TDMS). An illustrated example is shown in the Figure below.

These TDMS tuples coming from various AI research papers go on to power leaderboards in the community. Leaderboards, akin to scoreboards, traditionally curated by the community, are platforms displaying various AI model scores for specific tasks, datasets, and metrics. Examples of such platforms include the benchmarks feature on the Open Research Knowledge Graph and Papers with Code (PwC). Utilizing text mining techniques allows for a transition from the conventional community-based leaderboard curation to an automated text mining approach. Consequently, we introduce the "SOTA? Tracking the State-of-the-Art in Scholarly Publications" shared task as Task 4 in the SimpleText track of CLEF 2024. The goal of the SOTA? shared task is to develop systems which given the full text of an AI paper, are capable of recognizing whether an incoming AI paper indeed reports model scores on benchmark datasets, and if so, to extract all pertinent (Task, Dataset, Metric, Score) tuples presented within the paper. The Figure below shows the scholarly paper source and the downstream Leaderboard application powered by extracted TDMS tuples.

Figure: End-to-end workflow of extracting (Task, Dataset, Metric, Score) tuples from an AI scholarly publication to power the Leaderboards dashboard on the Open Research Knowledge Graph.

Task Overview

The SOTA? shared task is defined on a dataset of Artificial Intelligence scholarly articles. There are two kinds of articles: one reporting (Task, Dataset, Metric, Score) tuples and another kind that do not report the TDMS tuples. For the articles reporting TDMS tuples, all the reported TDMS annotations are provided in a separate file accompanying the scraped full-text of the articles. The extraction task is defined as follows.

Develop a machine learning model that can distinguish whether a scholarly article provided as input to the model reports a TDMS or not. And for articles reporting TDMSs, extract all the relevant ones.

Given the recent upsurge in the developments in generative AI in the form of Large Language Models (LLMs), creative LLM-based solutions to the task are particularly invited. The task does not place any restrictions on the application of open-sourced versus closed-sourced LLMs. Nonetheless, development of open-sourced solutions are encouraged.

For more background information on this task, we recommend the following publications:

Salomon Kabongo, Jennifer D'Souza and Sören Auer (2023). Zero-Shot Entailment of Leaderboards for Empirical AI Research. In: ACM/IEEE Joint Conference on Digital Libraries. JCDL 2023.Santa Fe, NM, USA, 2023, pp. 237-241. https://doi.org/10.1109/JCDL57899.2023.00042 (Pre-print available at https://arxiv.org/abs/2303.16835)
Salomon Kabongo, Jennifer D'Souza, & Sören Auer (2023). ORKG-Leaderboards: a systematic workflow for mining leaderboards as a knowledge graph. International Journal on Digital Libraries (2023). https://doi.org/10.1007/s00799-023-00366-1

Organizers

Dr. Jennifer D'Souza - TIB Leibniz Information Centre for Science and Technology, Germany

Salomon Kabongo - L3S Research Center, Germany

Hamed Babaei Giglou - TIB Leibniz Information Centre for Science and Technology, Germany

Yue Zhang - Berlin Technical University, Germany

Prof. Dr. Sören Auer - TIB Leibniz Information Centre for Science and Technology, Germany

Contact

sota.task [at] gmail.com

Funding Statement

SimpleText's SOTA Task is jointly supported by the NFDI4DataScience initiative (DFG, German Research Foundation, Grant ID: 460234259) and the SCINEXT project (BMBF, German Federal Ministry of Education and Research, Grant ID: 01lS22070).