LLMs4OL 2026: Large Language Models for Ontology Learning
The 3rd LLMs4OL Challenge @ ISWC 2026
ISWC 2026, Bari, Italy | 25-29 October
ISWC 2026, Bari, Italy | 25-29 October
We are excited to introduce OntoLearner, a powerful, modern framework purpose-built for ontology learning tasks. This guide explains why you should use it and how to get started quickly.
OntoLearner is a modular, open-source Python framework designed for semi-automatic construction and enrichment of ontologies from unstructured sources, powered by LLMs. It bridges the gap between traditional ontology engineering and cutting-edge AI—giving you access to state-of-the-art language understanding without the burden of manual effort. Key advantages of OntoLearner over manual or legacy approaches are:
LLM-Powered Performance: Rather than relying on hand-crafted rules or static knowledge bases, OntoLearner harnesses foundation models for genuine semantic understanding. The framework automatically captures domain nuances and can combine retrieval-augmented generation (RAG) with LLM reasoning, meaning you get both speed and accuracy. You are not reinventing the wheel—you are standing on the shoulders of billions of tokens of pre-trained knowledge.
Comprehensive Ontology Support: You don't need to scramble for training data. OntoLearner ships with curated, production-grade ontologies ready to go. For example:
AgrO gives you over 1,000 concepts covering agriculture and agronomy.
SUMO is the largest formal ontology in existence and is already mapped to WordNet.
The Common Core Ontologies (CCO) provide reusable foundational concepts that work across domains.
Beyond these, the framework supports domain-specific ontologies for finance, material science and engineering, and more—all accessible through a simple load() call.
Battle-Tested Evaluation: Gone are the days of uncertainty about whether your model is actually good. OntoLearner includes built-in metrics (precision, recall, F1, and domain-specific benchmarks), standardized datasets extracted from real ontologies, and automatic train/test split handling. You can immediately see how you're performing and compare against baselines. The evaluation pipeline is comprehensive enough that you can trust your results.
Fast Prototyping to Competition-Ready Code: Most frameworks bog you down in boilerplate. OntoLearner gets you from zero to a trained model in under 50 lines of Python. The pipeline is modular, so you can easily swap out different retrievers, LLMs, and tasks as you experiment. And unlike research prototypes, the framework includes production-ready logging and error handling, so your code is actually reliable when it matters.
Installation
pip install ontolearner
Your First Ontology Learning Pipeline:
from ontolearner import LearnerPipeline, AgrO, train_test_split
# 1. Load a curated ontology (automatic download from Hugging Face)
ontology = AgrO()
ontology.load()
# 2. Extract and split the dataset
train_data, test_data = train_test_split(
ontology.extract(),
test_size=0.2,
random_state=42
)
# 3. Build your learning pipeline (retriever + LLM)
pipeline = LearnerPipeline(
retriever_id='sentence-transformers/all-MiniLM-L6-v2', # Fast, accurate retrieval
llm_id='Qwen/Qwen2.5-0.5B-Instruct', # Efficient LLM
batch_size=32,
top_k=5
)
# 4. Train, predict, and evaluate in one call
outputs = pipeline(
train_data=train_data,
test_data=test_data,
evaluate=True,
task='term-typing' # Or 'concept-discovery', 'relation-extraction', etc.
)
# 5. Check your results
print("Metrics:", outputs['metrics'])
print("Elapsed Time:", outputs['elapsed_time'])
That's it! You now have a trained ontology learning model with full evaluation metrics.
OntoLearner doesn't lock you into a single task type. Depending on what your task requires, you can tackle term typing, taxonomy discovery, and non-taxonomic relationship extraction. The same framework adapts to all of these without requiring you to rewrite everything from scratch. This means you have a starting point for your novel contribution to the challenge.
Ontologizer is a foundational module within OntoLearner that transforms ontologies into programmatically accessible Python objects, enabling seamless loading, inspection, and reuse across diverse domains. It supports multiple ontology formats (OWL, RDF, XML, TTL) and integrates metadata management, automated metric evaluation, and documentation generation to ensure ontologies are FAIR-compliant and traceable. By allowing users to import ontologies directly from web sources or HuggingFace repositories without manual file handling, Ontologizer simplifies ontology modularization and promotes scalable, cross-domain ontology enrichment. It supports version control and collaborative updates, and it optimizes performance for large ontologies with multiprocessing, providing a flexible, user-friendly foundation for ontology-driven workflows and research.
How does Ontologizer work?
from ontolearner import AgrO
# 1. Initialize an ontologizer from OntoLearner
ontology = AgrO()
# 2. Load the ontology automatically from Hugging Face
ontology.load()
# 3. Extract the learning task dataset
data = ontology.extract()
print(ontology)
# outputs:
# ontology_id: AgrO
# ontology_full_name: Agronomy Ontology (AgrO)
# domain: Agriculture
# category: Agronomy
# version: 1.0
# last_updated: 2022-11-02
# creator: The Crop Ontology Consortium
# license: Creative Commons 4.0
# format: RDF
# download_url: https://agroportal.lirmm.fr/ontologies/AGRO?p=summary
💡 Learn more about Ontologizer at https://ontolearner.readthedocs.io/ontologizer/ontology_modularization.html.
To learn more about OntoLearner:
Documentation website: https://ontolearner.readthedocs.io/
Hugging Face: https://huggingface.co/collections/SciKnowOrg/ontolearner-benchmarking
💡 To contribute (bug reports or features), see the guidelines: https://github.com/sciknoworg/OntoLearner/blob/main/CONTRIBUTING.md
Phase 2 of the LLMs4OL Challenge focuses on the integration of challenge participant systems into the OntoLearner framework. This guide provides clear instructions for integrating your ontology learning approach as a custom Learner module within OntoLearner.
Option 1: Issue-Based Submission
Submit a GitHub issue containing:
A single, standalone Python script implementing your full learner approach
Complete working example script(s) with usage documentation
A documentation page (markdown) explaining your approach, requirements, and how to use it
Option 2: Pull Request Submission
Submit a Pull Request containing:
Your learner module is integrated into OntoLearner's structure
Full source code with docstrings and type hints
Example usage script(s)
A documentation page in OntoLearner's docs (e.g., `docs/source/learners/your_learner.rst`)
Unit tests (recommended)
All OntoLearner learners inherit from a base learner class and implement three core functions (highlighted in blue):
class AutoLearner(ABC):
"""
Abstract base class for ontology learning models.
This class defines the standard interface for all learning models in OntoLearner,
including retrieval-based, LLM-based, and hybrid approaches. All concrete learner
implementations must inherit from this class and implement the required methods.
"""
def __init__(self, **kwargs: Any):
pass
def load(self, **kwargs: Any):
pass
def fit(self, train_data: Any, task: str, ontologizer: bool=True):
pass
def predict(self, eval_data: Any, task: str, ontologizer: bool=True) -> Any:
pass
def fit_predict(self, train_data: Any, eval_data: Any, task: str) -> Any:
pass
def _term_typing(self, data: Any, test: bool = False) -> Optional[Any]:
pass
def _taxonomy_discovery(self, data: Any, test: bool = False) -> Optional[Any]:
pass
def _non_taxonomic_re(self, data: Any, test: bool = False) -> Optional[Any]:
pass
def _text2onto(self, data: Any, test: bool = False) -> Optional[Any]:
pass
def tasks_data_former(self, data: Any, task: str, test: bool = False) -> List[str | Dict[str, str]]:
pass
def tasks_ground_truth_former(self, data: Any, task: str) -> List[Dict[str, str]]:
pass
The base class, AutoLearner, is available at https://github.com/sciknoworg/OntoLearner/blob/main/ontolearner/base/learner.py#L23-L220 and can be imported as a:
from ontolearner.base import AutoLearner
Once you have imported it, you need to inherit your class from it and define the necessary functions (highlighted in blue in the above code).
from ontolearner.base import AutoLearner
class MyRueseLearner(AutoLearner):
def __init__(self, **kwargs: Any):
pass
def load(self, **kwargs: Any):
pass
def fit(self, train_data: Any, task: str, ontologizer: bool=True):
pass
def predict(self, eval_data: Any, task: str, ontologizer: bool=True) -> Any:
pass
Once you have made this, and you are able to use the class inside your example script, you are done with the preparations. We will take care of the rest.
💡 Note: you might define your own functions inside the class, it is completely up to you, as long as the learning process is available in fit method and prediction is available in predict, the logic is not changed!
LLM (base class: AutoLLM)
Use large language models directly (OpenAI, local models, etc.)
Leverage prompt engineering and few-shot learning
Best for: Leveraging model knowledge, few-shot scenario systems.
If your system falls into this category, you might consider looking at the AutoLLM class at https://github.com/sciknoworg/OntoLearner/blob/main/ontolearner/base/learner.py#L222-L325
Retriever (AutoRetriever)
Use dense retrieval models to match terms/types
No training phase; uses pre-trained embeddings
Best for: Fast inference, low-resource scenarios, or when you're working with a retrieval-based approach.
If your system falls into this category, you might consider looking at the AutoRetriever class at https://github.com/sciknoworg/OntoLearner/blob/main/ontolearner/base/learner.py#L327-L424
💡 All the above-mentioned modules are available in OntoLearner, and their combination forms an AutoRAGLearner, so consider the above base classes as helpers if your working system has the LLM or retriever components. Here is a clear example:
GloVe and Word2Vec are embedding models that can be used as a retriever, so we defined two retrievers that are inherent from AutoRetriever, look at the implementation at: https://github.com/sciknoworg/OntoLearner/blob/main/ontolearner/learner/retriever/embedding.py
The GloVe and Word2Vec embeddings as a retriever alone is not a learner model, but a helper in a retriever-based learner, which is why we defined an encapsulated AutoRetrieverLearner that inherits from AutoLearner class. Here is the definition of AutoRetrieverLearner: https://github.com/sciknoworg/OntoLearner/blob/main/ontolearner/learner/retriever/learner.py
By having things separated from each other, now in any models that you like to use Word2Vec, you don't need to define it, you simply need ot call the library to use it. So, this is the reason why we are asking participants to use AutoLearner as a base class for their learners and existing modules (if it is suitable - but not mandatory) to draft your main learner. This would save a lot of time.
As an example integration, look at RWTH-DBIS Learner model:
Documentation page: https://ontolearner.readthedocs.io/learners/llms4ol_challenge/rwthdbis_learner.html
Example script: https://github.com/sciknoworg/OntoLearner/blob/main/examples/llm_learner_rwthdbis_taxonomy_discovery.py
💡Have questions? Open an issue on OntoLearner, and we will clarify your concern — your question might help others too!