LLMs4OL 2025: Large Language Models for Ontology Learning
The 2nd LLMs4OL Challenge @ ISWC 2025
ISWC 2025, Nara, Japan | 2-6 November
ISWC 2025, Nara, Japan | 2-6 November
This task focuses on extracting ontological types and terms from unstructured text. Given an unstructured text corpus/documents, the goal is to identify foundational elements for ontology construction by recognizing domain-relevant vocabulary and categorizing it appropriately. We aim to extract:
Terms (or Entities): These are specific terms that form the basis of an ontology. They populate the ontology by instantiating the defined classes. For instance, COVID-19 is a term of the type Disease, and Paris is a term of the type City.
Types (or Classes): These are abstract categories or groupings that represent general concepts within a domain. They form the backbone of an ontology's structure. Examples include Disease, Vehicle, or City.
By identifying and extracting these elements, the task helps bridge the gap between unstructured natural language and structured ontological knowledge. This process is critical for building knowledge representations that support reasoning, semantic integration, and advanced information retrieval.
Given a set of documents from one domain, extract all relevant terms that could form the basis of an ontology.
Per SubTasks, each dataset corresponds to a specific domain, as outlined below:
Ecology: A dataset that considers the construction of an ontology based on concepts and terminology in the ecology domain.
Scholarly: A dataset that considers the construction of an ontology grounded in scholarly communication and the academic publishing domain.
Engineering: A dataset that considers the construction of an ontology derived from the engineering domain, including relevant structures, processes, and terminologies.
The datasets are available at the "TaskA-Text2Onto/" directory of the challenge repository here https://github.com/sciknoworg/LLMs4OL-Challenge/tree/main/2025.
Per dataset, five key files were given as follows:
A file with the documents.jsonl file contains the textual documents. Each line in the file represents a single document, structured with id, title, and text fields. This file will be provided as input to the systems for both tasks. Example sample document:
{
"id": "34_0",
"title": "Types of Distances in Units of Measure",
"text": "In the realm of Units of Measure, distance is a fundamental concept that comes in various forms. The distance modulus is categorized as a type of distance. Additionally, there are several other types of distances, including the total 3D start-end distance, xy 2D start-end distance, total distance travelled, and xy distance travelled, all of which fall under the broader category of distance. Understanding these different types of distances is crucial for accurate measurements in various fields."
}
The terms.txt is a plain text file listing terms (one per line). This file defines the vocabulary of interest for SubTask A1.
The types.txt is a plain text file listing the available types (one per line). This file defines the types of interest for SubTask A2.
Additional terms2docs.json JSON file for mapping each term to the list of document IDs where the term appears. This helps link terms to their context in the source data and supports methods using term-document co-occurrence.
Additional terms2types.json JSON for file mapping each term to its type(s). These types are typically drawn from a predefined set in types.txt and serve as labels for ontology concept typing.
🧪 During the testing phase, only the documents.jsonl file will be given, and terms.txt for SubTask A1 and types.txt for SubTask A2 are expected.