LLMs4OL 2026: Large Language Models for Ontology Learning

The 3rd LLMs4OL Challenge @ ISWC 2026

‌ISWC 2026, Bari, Italy | 25-29 October

Taxonomy Task

Taxonomy Learning

Definition: Given ontologies from source domains, induce a hierarchical taxonomy for an unseen domain.

Motivation: Many ontology learning systems overfit to specific domains, where methods tuned for medicine or chemistry may fail in new domains. Moreover, taxonomy learning is often domain-specific. This task tests cross-domain robustness, which is critical for real-world OL systems. LLMs, with their general world knowledge, may offer better domain transfer compared to purely statistical or rule-based methods.

Objective

Participants are required to discover a taxonomy (class hierarchy) in a target domain after training or observing ontologies from other source domains. Unlike End-to-End OL or Ontology Extension tasks, this task focuses solely on hierarchical structure discovery and tests cross-domain generalization. The participant will do the training on ontologies from domains such as medicine, engineering, or chemistry. Evaluate on an unseen domain, e.g., material science. The goal is to correctly infer is-a / subclass relationships among types in the new domain. Participants can either develop their own taxonomy discovery algorithm or adapt existing taxonomy induction methods, including LLM-based or embedding-based approaches.

What must participants build?

A system that, starting from a given unseen ontology's types (classes) and participants, should perform:

Taxonomic Discovery (is-a/subclass)

A toy example

Considering that the model learned how to create a taxonomy. Given a list of types as follows:

device

sensor

system

application

environmental condition

measurement

person

We expect the following outputs:

(sensor, is-a, device)

(system, is-a, device)

(application, is-a, device)

Dataset

Data Source: The dataset for this task is provided through OntoLearner, a comprehensive ontology repository hosting 180+ ontologies spanning 22 different domains. OntoLearner serves as both the training corpus and evaluation framework for this taxonomy learning task.

Training Data: Participants are allowed to use all available ontologies from OntoLearner for training and validation purposes. This includes ontologies from domains such as:

Medicine
Agriculture
And 15+ other domains

How to access the train dataset (example):

from ontolearner.ontology.events import Conference

ontology = Conference()

llms4ol_data = ontology.extract()

types = llms4ol_data.type_taxonomies.types

taxonomy = llms4ol_data.type_taxonomies.taxonomies

Where the types are the list of types and taxonomy is the expected output, which can be used as training. The expected outputs are as follows:

types:

[

"Organisational role during event",

"Affiliation role",

"Break",

"Meal",

"Proceedings",

...

]

taxonomy

[

{

"ID": "TR_efb835d1",

"parent": "Organised event",

"child": "Academic event"

{

"ID": "TR_d884cc67",

"parent": "Organised event",

"child": "Non academic event"

{

"ID": "TR_3171e755",

"parent": "Time indexed situation",

"child": "AffiliationAtTimeOfSubmission"

....

]

In which you can translate the taxonomy manually into (child, is-a, parent) format (NOTE-1: this format is required when submitting for evaluation, but for training, it is not mandatory or required)

("Academic event", "is-a", "Organised event")

("Non academic event", "is-a", "Organised event")

("AffiliationAtTimeOfSubmission", "is-a", "Time indexed situation")

....

Now explore more domains and their ontologies, and how to load them via OntoLearner at: https://ontolearner.readthedocs.io/benchmarking/benchmark.html. (Note-2: Participants are allowed to use as many of the ontologies as they are willing to use from the OntoLearner. There is no restriction on the training data. However, the test domain usage is prohibited.)

Important Constraint: While cross-domain training is encouraged, training on the specific test domains is strictly prohibited to ensure fair evaluation of generalization capabilities.

Test Domains: The evaluation will be conducted on ontologies from the following held-out domains:

Material Science and Engineering (expect 2-5 ontologies)
Biology and Life Sciences (expect 1-2 ontologies)
Scholarly Knowledge (expect 2-5 ontologies)
Ecology and Environment (expect 1-2 ontologies)

Evaluation Metrics

Standard Metrics: Precision, Recall, F1

Quick Start

import ontolearner

ontology = ontolearner.ontology.Conference()

ontology.load()

ontological_data = ontology.extract()

train_data, test_data = ontolearner.train_test_split(

ontological_data,

test_size=0.2,

random_state=42

)

task = 'taxonomy-discovery'

# Initialize the LLM learner with prompting and label mapping strategies

llm_learner = ontolearner.AutoLLMLearner(

prompting=ontolearner.StandardizedPrompting,

label_mapper=ontolearner.LabelMapper(), # Convert between label formats and natural language

token='your-huggingface-token'

)

# Initialize the retriever

retriever_learner = ontolearner.AutoRetrieverLearner(top_k=5)

# Create a RAG pipeline

rag_learner = ontolearner.AutoRAGLearner(llm=llm_learner, retriever=retriever_learner)

rag_learner.load(retriever_id='sentence-transformers/all-MiniLM-L6-v2', llm_id='Qwen/Qwen2.5-0.5B-Instruct')

# fit, predict, and evaluate

rag_learner.fit(train_data, task=task)

predicts = rag_learner.predict(test_data, task=task)

truth = rag_learner.tasks_ground_truth_former(data=test_data, task=task)

metrics = evaluation_report(y_true=truth, y_pred=predicts, task=task)

print(metrics)

Reference: https://ontolearner.readthedocs.io/learners/rag.html

Other starting points could be:

RWTH-DBIS Learner: https://ontolearner.readthedocs.io/learners/llms4ol_challenge/rwthdbis_learner.html#taxonomy-discovery
SKH-NLP Learner: https://ontolearner.readthedocs.io/learners/llms4ol_challenge/skhnlp_learner.html

What approaches can be developed?

Fine-tuning an embedding model for parent–child relationships: Instead of generic semantic similarity, you can train embeddings specifically to capture hierarchical relations
Hyperbolic embeddings for taxonomy preservation: Hyperbolic space is particularly well-suited for hierarchical data because it expands exponentially—just like trees.
Fine-tuning LLMs on given taxonomies: You can adapt a large language model to internalize a specific taxonomy structure.
Context-informed prompting: Rather than training, you can leverage prompting strategies to guide the model dynamically.
Agentic AI approach: Moving beyond single-step predictions to multi-step reasoning systems that actively build and refine taxonomies.
Hybrid approaches
or ...

There is no restriction in terms of approaches!

Page updated

Google Sites

Report abuse