LLMs4OL: Large Language Models for Ontology Learning

LLMs4OL Challenge @ ISWC 2024

‌ISWC 2024, Baltimore, Maryland, USA

11-15 November, 2024

Background

The Semantic Web aims to enrich the current web with structured knowledge and metadata for enhanced interoperability and understanding across systems. Central to this endeavor is Ontology Learning (OL), which automates the extraction of this structured knowledge from unstructured data, crucial for building dynamic ontologies foundational to the Semantic Web. The advent of Large Language Models (LLMs) has introduced a promising approach to OL, leveraging their deep linguistic understanding and pattern inference capabilities to automate OL.

Our prior work published in the ISWC 2023 research track proceedings titled "LLMs4OL: Large Language Models for Ontology Learning" marked a notable direction towards employing LLMs in OL, demonstrating their potential in automating knowledge acquisition and representation for the Semantic Web. Based on this research, the LLMs4OL Challenge@ISWC-2024 as a community development endeavor in the 23rd ISWC call for challenges. With the ISWC-LLMs4OL 2024 challenge, we aim to catalyze community-wide engagement in validating and expanding the use of LLMs in OL. This initiative is poised to advance our comprehension of LLMs’ roles within the Semantic Web, encouraging innovation and collaboration in developing scalable and accurate ontology learning methods.

This challenge aims to align with the Semantic Web community's goals of making the web more intelligent and user-friendly, offering a novel avenue for exploring the intersection of LLMs and OL. Participation in this challenge will contribute to evolving the Semantic Web, enabling more sophisticated services that utilize structured knowledge effectively.

Challenge Overview

The ISWC-LLMs4OL 2024 challenge, we have defined three main tasks. OL tasks revolve around ontology primitives:

1. lexical entries L

2. conceptual types T

3. a hierarchical taxonomy H_T

4. non-taxonomic relations R within a heterarchy H_R

5. axioms A for constraints and rules.

Key OL activities include corpus preparation, terminology extraction, term typing, taxonomy construction, relationship extraction, and axiom discovery. Together, these six tasks constitute the LLMs4OL task framework (see the following figure), aligning with the previously outlined LLMs4OL conceptual model.

Assuming the corpus preparation step is done by reusing ontologies publicly released in the community, for the first iteration of the LLMs4OL Challenge@ISWC-2024 we introduce three main tasks.

Task A - Term Typing: Discover the generalized type for a lexical term.
Task B - Taxonomy Discovery: Discover the taxonomic hierarchy between type pairs.
Task C - Non-Taxonomic Relation Extraction: Identify non-taxonomic, semantic relations between types.

Participation in all three tasks in the LLMs4OL Challenge@ISWC-2024 is not mandatory. Thus participants can choose to enroll only in Task A or B or C, or Task A and B, or Task A and C, or Task B and C. Furthermore, each task offers several ontologies and participants have the option to choose one or more or all of the ontologies as their participation. E.g., Task A offers WordNet, GeoNames, NCI, etc. Participants can choose to participate in Task A, and address the task just for WordNet or only for WordNet and GeoNames or only for GeoNames and NCI or, and we recommend this option, for all the provided ontologies. Participants are encouraged to implement LLM-based solutions, and we do not impose any restrictions on the LLM prompting methods. For instance, you can choose to bring in additional context information from the World Wide Web to enrich the training and test instances.

There are two main evaluation phases for the challenge.

1. Few-shot testing phase

Each ontology selected for system training will be divided into two parts: one part will be released for the training of the systems and another part will be reserved for the testing of systems in this phase. Furthermore, the evaluations will be conducted in two stages of consecutive time windows. In the first stage, only the test datasets for Task A will be released. Then the first stage evaluation window will be declared closed, i.e. no more submissions post the stage 1 evaluation period will be evaluated. Upon closing stage 1, the evaluations for stage 2 will commence. In stage 2, the test dataset for Tasks B and C will be released. This is deliberately planned to disallow for any new term types released in Task B or C test sets to influence systems' performance on the Task A test set.

2. Zero-shot testing phase

The zero-shot testing phase will also be organized into two stages that will run in parallel with the few-shot testing phase. The only difference is that new ontologies unseen during training will be introduced. The first stage will test for Task A and the subsequent stage will test for Tasks B and C. The zero-shot testing phase objective is to evaluate the generalizability and transferability of the LLMs developed in this challenge.

NOTE: Ontologies offered for each of the tasks of the LLMs4OL challenge are publicly available sources. Participation in this task mandates that participants do not re-create new datasets from the publicly sourced ontologies as we reserve a portion of the respective ontologies for our few-shot testing phase. Thus participants are strictly restricted to using the dataset we as the challenge organizers release. Training on the re-created datasets from the full source ontologies would imply unfair testing of the systems where the systems would not be considered solutions to the OL problem. However, for our given datasets, participants are free to introduce context information from the World Wide Web, provided that this is explicitly stated in the supporting system submission documentation either in the publication or readme and would be considered part of the methodology.

References

1. Babaei Giglou, H., D’Souza, J., Auer, S. (2023). LLMs4OL: Large Language Models for Ontology Learning. In: Payne, T.R., et al. The Semantic Web – ISWC 2023. ISWC 2023. Lecture Notes in Computer Science, vol 14265. Springer, Cham. https://doi.org/10.1007/978-3-031-47240-4_22

2. Babaei Giglou, H, LLMs4OL: Large Language Models for Ontology Learning, (2023). https://github.com/HamedBabaei/LLMs4OL.

Organizers

Hamed Babaei Giglou

TIB Leibniz Information Centre for Science and Technology

Dr. Jennifer D'Souza

TIB Leibniz Information Centre for Science and Technology

Prof. Dr. Sören Auer

TIB Leibniz Information Centre for Science and Technology

Funding Statement

LLMs4OL Challenge @ ISWC-2024 is jointly supported by the NFDI4DataScience initiative (DFG, German Research Foundation, Grant ID: 460234259) and the SCINEXT project (BMBF, German Federal Ministry of Education and Research, Grant ID: 01lS22070).

Questions?

Contact llms4ol.challenge [at] gmail.com to get more information on the challenge.