LLMs4OL 2026: Large Language Models for Ontology Learning

The 3rd LLMs4OL Challenge @ ISWC 2026

‌ISWC 2026, Bari, Italy | 25-29 October

We are excited to introduce OntoLearner, a powerful, modern framework purpose-built for ontology learning tasks. This guide explains why you should use it and how to get started quickly.

🎯 Why Use OntoLearner?

OntoLearner is a modular, open-source Python framework designed for semi-automatic construction and enrichment of ontologies from unstructured sources, powered by LLMs. It bridges the gap between traditional ontology engineering and cutting-edge AI—giving you access to state-of-the-art language understanding without the burden of manual effort. Key advantages of OntoLearner over manual or legacy approaches are:

LLM-Powered Performance: Rather than relying on hand-crafted rules or static knowledge bases, OntoLearner harnesses foundation models for genuine semantic understanding. The framework automatically captures domain nuances and can combine retrieval-augmented generation (RAG) with LLM reasoning, meaning you get both speed and accuracy. You are not reinventing the wheel—you are standing on the shoulders of billions of tokens of pre-trained knowledge.
Comprehensive Ontology Support: You don't need to scramble for training data. OntoLearner ships with curated, production-grade ontologies ready to go. For example:
- AgrO gives you over 1,000 concepts covering agriculture and agronomy.
- SUMO is the largest formal ontology in existence and is already mapped to WordNet.
- The Common Core Ontologies (CCO) provide reusable foundational concepts that work across domains.
- Beyond these, the framework supports domain-specific ontologies for finance, material science and engineering, and more—all accessible through a simple load() call.
Battle-Tested Evaluation: Gone are the days of uncertainty about whether your model is actually good. OntoLearner includes built-in metrics (precision, recall, F1, and domain-specific benchmarks), standardized datasets extracted from real ontologies, and automatic train/test split handling. You can immediately see how you're performing and compare against baselines. The evaluation pipeline is comprehensive enough that you can trust your results.
Fast Prototyping to Competition-Ready Code: Most frameworks bog you down in boilerplate. OntoLearner gets you from zero to a trained model in under 50 lines of Python. The pipeline is modular, so you can easily swap out different retrievers, LLMs, and tasks as you experiment. And unlike research prototypes, the framework includes production-ready logging and error handling, so your code is actually reliable when it matters.

🚀 Quick Start

Installation

pip install ontolearner

Your First Ontology Learning Pipeline:

from ontolearner import LearnerPipeline, AgrO, train_test_split

# 1. Load a curated ontology (automatic download from Hugging Face)

ontology = AgrO()

ontology.load()

# 2. Extract and split the dataset

train_data, test_data = train_test_split(

ontology.extract(),

test_size=0.2,

random_state=42

)

# 3. Build your learning pipeline (retriever + LLM)

pipeline = LearnerPipeline(

retriever_id='sentence-transformers/all-MiniLM-L6-v2', # Fast, accurate retrieval

llm_id='Qwen/Qwen2.5-0.5B-Instruct', # Efficient LLM

batch_size=32,

top_k=5

)

# 4. Train, predict, and evaluate in one call

outputs = pipeline(

train_data=train_data,

test_data=test_data,

evaluate=True,

task='term-typing' # Or 'concept-discovery', 'relation-extraction', etc.

)

# 5. Check your results

print("Metrics:", outputs['metrics'])

print("Elapsed Time:", outputs['elapsed_time'])

That's it! You now have a trained ontology learning model with full evaluation metrics.

OntoLearner doesn't lock you into a single task type. Depending on what your task requires, you can tackle term typing, taxonomy discovery, and non-taxonomic relationship extraction. The same framework adapts to all of these without requiring you to rewrite everything from scratch. This means you have a starting point for your novel contribution to the challenge.

🕸️ How to access diverse datasets or ontologies?

Ontologizer is a foundational module within OntoLearner that transforms ontologies into programmatically accessible Python objects, enabling seamless loading, inspection, and reuse across diverse domains. It supports multiple ontology formats (OWL, RDF, XML, TTL) and integrates metadata management, automated metric evaluation, and documentation generation to ensure ontologies are FAIR-compliant and traceable. By allowing users to import ontologies directly from web sources or HuggingFace repositories without manual file handling, Ontologizer simplifies ontology modularization and promotes scalable, cross-domain ontology enrichment. It supports version control and collaborative updates, and it optimizes performance for large ontologies with multiprocessing, providing a flexible, user-friendly foundation for ontology-driven workflows and research.

How does Ontologizer work?

from ontolearner import AgrO

# 1. Initialize an ontologizer from OntoLearner

ontology = AgrO()

# 2. Load the ontology automatically from Hugging Face

ontology.load()

# 3. Extract the learning task dataset

data = ontology.extract()

print(ontology)

# outputs:

# ontology_id: AgrO

# ontology_full_name: Agronomy Ontology (AgrO)

# domain: Agriculture

# category: Agronomy

# version: 1.0

# last_updated: 2022-11-02

# creator: The Crop Ontology Consortium

# license: Creative Commons 4.0

# format: RDF

# download_url: https://agroportal.lirmm.fr/ontologies/AGRO?p=summary

💡 Learn more about Ontologizer at https://ontolearner.readthedocs.io/ontologizer/ontology_modularization.html.

📖 Documentation & Resources

To learn more about OntoLearner:

Documentation website: https://ontolearner.readthedocs.io/
GitHub: https://github.com/sciknoworg/OntoLearner
PyPI: https://pypi.org/project/OntoLearner/
Hugging Face: https://huggingface.co/collections/SciKnowOrg/ontolearner-benchmarking

💡 To contribute (bug reports or features), see the guidelines: https://github.com/sciknoworg/OntoLearner/blob/main/CONTRIBUTING.md

🔧 How to integrate your system in Phase 2 of the challenge?

Phase 2 of the LLMs4OL Challenge focuses on the integration of challenge participant systems into the OntoLearner framework. This guide provides clear instructions for integrating your ontology learning approach as a custom Learner module within OntoLearner.

You have two submission pathways:

Option 1: Issue-Based Submission
- Submit a GitHub issue containing:
  - A single, standalone Python script implementing your full learner approach
  - Complete working example script(s) with usage documentation
  - A documentation page (markdown) explaining your approach, requirements, and how to use it
Option 2: Pull Request Submission
- Submit a Pull Request containing:
  - Your learner module is integrated into OntoLearner's structure
  - Full source code with docstrings and type hints
  - Example usage script(s)
  - A documentation page in OntoLearner's docs (e.g., `docs/source/learners/your_learner.rst`)
  - Unit tests (recommended)

Understanding OntoLearner's Learner Architecture

All OntoLearner learners inherit from a base learner class and implement three core functions (highlighted in blue):

class AutoLearner(ABC):

"""

Abstract base class for ontology learning models.

This class defines the standard interface for all learning models in OntoLearner,

including retrieval-based, LLM-based, and hybrid approaches. All concrete learner

implementations must inherit from this class and implement the required methods.

"""

def __init__(self, **kwargs: Any):

pass

def load(self, **kwargs: Any):

pass

def fit(self, train_data: Any, task: str, ontologizer: bool=True):

pass

def predict(self, eval_data: Any, task: str, ontologizer: bool=True) -> Any:

pass

def fit_predict(self, train_data: Any, eval_data: Any, task: str) -> Any:

pass

def _term_typing(self, data: Any, test: bool = False) -> Optional[Any]:

pass

def _taxonomy_discovery(self, data: Any, test: bool = False) -> Optional[Any]:

pass

def _non_taxonomic_re(self, data: Any, test: bool = False) -> Optional[Any]:

pass

def _text2onto(self, data: Any, test: bool = False) -> Optional[Any]:

pass

def tasks_data_former(self, data: Any, task: str, test: bool = False) -> List[str | Dict[str, str]]:

pass

def tasks_ground_truth_former(self, data: Any, task: str) -> List[Dict[str, str]]:

pass

The base class, AutoLearner, is available at https://github.com/sciknoworg/OntoLearner/blob/main/ontolearner/base/learner.py#L23-L220 and can be imported as a:

from ontolearner.base import AutoLearner

Once you have imported it, you need to inherit your class from it and define the necessary functions (highlighted in blue in the above code).

from ontolearner.base import AutoLearner

class MyRueseLearner(AutoLearner):

def __init__(self, **kwargs: Any):

pass

def load(self, **kwargs: Any):

pass

def fit(self, train_data: Any, task: str, ontologizer: bool=True):

pass

def predict(self, eval_data: Any, task: str, ontologizer: bool=True) -> Any:

pass

Once you have made this, and you are able to use the class inside your example script, you are done with the preparations. We will take care of the rest.

💡 Note: you might define your own functions inside the class, it is completely up to you, as long as the learning process is available in fit method and prediction is available in predict, the logic is not changed!

Core Learner architectures in OntoLearner

LLM (base class: AutoLLM)

Use large language models directly (OpenAI, local models, etc.)
Leverage prompt engineering and few-shot learning
Best for: Leveraging model knowledge, few-shot scenario systems.
If your system falls into this category, you might consider looking at the AutoLLM class at https://github.com/sciknoworg/OntoLearner/blob/main/ontolearner/base/learner.py#L222-L325

Retriever (AutoRetriever)
- Use dense retrieval models to match terms/types
- No training phase; uses pre-trained embeddings
- Best for: Fast inference, low-resource scenarios, or when you're working with a retrieval-based approach.
- If your system falls into this category, you might consider looking at the AutoRetriever class at https://github.com/sciknoworg/OntoLearner/blob/main/ontolearner/base/learner.py#L327-L424

💡 All the above-mentioned modules are available in OntoLearner, and their combination forms an AutoRAGLearner, so consider the above base classes as helpers if your working system has the LLM or retriever components. Here is a clear example:

GloVe and Word2Vec are embedding models that can be used as a retriever, so we defined two retrievers that are inherent from AutoRetriever, look at the implementation at: https://github.com/sciknoworg/OntoLearner/blob/main/ontolearner/learner/retriever/embedding.py
The GloVe and Word2Vec embeddings as a retriever alone is not a learner model, but a helper in a retriever-based learner, which is why we defined an encapsulated AutoRetrieverLearner that inherits from AutoLearner class. Here is the definition of AutoRetrieverLearner: https://github.com/sciknoworg/OntoLearner/blob/main/ontolearner/learner/retriever/learner.py
By having things separated from each other, now in any models that you like to use Word2Vec, you don't need to define it, you simply need ot call the library to use it. So, this is the reason why we are asking participants to use AutoLearner as a base class for their learners and existing modules (if it is suitable - but not mandatory) to draft your main learner. This would save a lot of time.

As an example integration, look at RWTH-DBIS Learner model:

💡Have questions? Open an issue on OntoLearner, and we will clarify your concern — your question might help others too!

OntoLearner Reference:

Giglou, Hamed Babaei, Jennifer D'Souza, Andrei Aioanei, Nandana Mihindukulasooriya, and Sören Auer. OntoLearner: A Modular Python Library for Ontology Learning with Large Language Models. arXiv preprint arXiv:2607.01977, 2026. https://arxiv.org/abs/2607.01977

Page updated

Google Sites

Report abuse