LLMs4OL 2025: Large Language Models for Ontology Learning
The 2nd LLMs4OL Challenge @ ISWC 2025
ISWC 2025, Nara, Japan | 2-6 November
ISWC 2025, Nara, Japan | 2-6 November
Hamed Babaei Giglou, Jennifer D'Souza, Nandana Mihindukulasooriya, and Sören Auer. LLMs4OL 2025 Overview: The 2nd Large Language Models for Ontology Learning Challenge
Rashin Rahnamoun and Mehrnoush Shamsfard. SBU-NLP at LLMs4OL 2025 Tasks A, B, and C: Stage-Wise Ontology Construction Through LLMs Without Any Training Procedure
Abstract: Automated ontology construction is a challenging task that traditionally requires extensive domain expertise, data preprocessing, and resource-intensive model training. While learning-based methods with fine-tuning are common, they often suffer from high computational costs and limited generalizability across domains. This paper explores a fully automated approach that leverages powerful large language models (LLMs) through prompt engineering, eliminating the need for training or fine-tuning. We participated in the LLMs4OL 2025 shared task, which includes four subtasks: extracting ontological terms and types (Text2Onto), assigning generalized types to terms (Term Typing), discovering taxonomic relations (Taxonomy Discovery), and extracting non-taxonomic semantic relations (Non-Taxonomic Relation Extraction). Our team focused on the first three tasks by using stratified random sampling, simple random sampling, and chunking-based strategies to include training sets in the prompts without limitations imposed by context window sizes. This simple yet general approach has proven effective across these tasks, enabling high-quality ontology construction without additional annotations or training. Additionally, we show that pretrained sentence embedding models ranging from 0.1B to 1.5B parameters perform comparably to a simple F1 token overlap baseline in taxonomy discovery, suggesting that embedding-based methods may not always offer significant advantages. Our findings highlight that prompt-based strategies with modern LLMs enable efficient, scalable, and domain-independent ontology construction, providing a promising alternative to traditional, resource-heavy methods.
Xinyi Zhao, Kevin Drake, Caroline Watanabe, Yuya Sasaki, and Hidetaka Hando. LABKAG at LLMs4OL 2025 Tasks A and C: Context-Rich Prompting for Ontology Construction
Abstract: This paper presents LABKAG's submission to the LLMs4OL 2025 Challenge, focusing on ontology construction from domain-specific text using large language models (LLMs). Our core methodology prioritizes prompt design over fine-tuning or external knowledge, demonstrating its effectiveness in generating structured knowledge. For Task A (Text2Onto: extracting ontological terms and types), we utilized a locally deployed Qwen3-8B model, while for Task C (Taxonomy Discovery: identifying taxonomic hierarchies), we evaluated the performance of GPT-4o-mini and Gemini 2.5 Pro. Our experiments consistently show that incorporating in-domain examples and providing richer context within prompts significantly enhances performance. These results confirm that well-engineered prompts enable LLMs to effectively extract entities and their hierarchical relationships, offering a lightweight, adaptable, and generalizable approach to structured knowledge extraction.
Aleksandra Beliaeva and Temurbek Rahmatullaev. Alexbek at LLMs4OL 2025 Tasks A, B, and C: Heterogeneous LLM Methods for Ontology Learning (Few-Shot Prompting, Ensemble Typing, and Attention-Based Taxonomies)
Abstract: We present a comprehensive system for addressing Tasks A, B, and C of the LLMs4OL 2025 challenge, which together span the full ontology construction pipeline: term extraction, typing, and taxonomy discovery. Our approach combines retrieval-augmented prompting, zero-shot classification, and attention-based graph modeling — each tailored to the demands of the respective task. For Task A, we jointly extract domain-specific terms and their ontological types using a retrieval-augmented generation (RAG) pipeline. Training data was reformulated into a document to terms and types correspondence, while test-time inference leverages semantically similar training examples. This single-pass method requires no model finetuning and improves overall performance through lexical augmentation. Task B, which involves assigning types to given terms, is handled via a dual strategy. In the few-shot setting (for domains with labeled training data), we reuse the RAG scheme with few-shot prompting. In the zero-shot setting (for previously unseen domains), we use a zero-shot classifier that combines cosine similarity scores from multiple embedding models using confidence-based weighting. In Task C, we model taxonomy discovery as graph inference. Using embeddings of type labels, we train a lightweight cross-attention layer to predict is-a relations by approximating a soft adjacency matrix. These modular, task-specific solutions enabled us to achieve top-ranking results in the official leaderboard across all three tasks. Taken together these strategies showcase the scalability, adaptability, and robustness of LLM-based architectures for ontology learning across heterogeneous domains.
Insan-Aleksandr Latipov, Mike Holenderski, and Nirvana Meratnia. IRIS at LLMs4OL 2025 Tasks B, C, and D: Enhancing Ontology Learning through Data Enrichment and Type Filtering
Abstract: Ontology Learning (OL) automates extracting structured knowledge from unstructured data. We study how model-agnostic data manipulations can boost performance of Large Language Models (LLM) on three OL tasks, i.e., term typing, taxonomy discovery and non-taxonomic relation extraction, from the LLMs4OL 2025 Challenge. We investigate two input‐enrichment techniques, i.e., (i) data augmentation, and (ii) addition of term and type definitions that expand the information supplied to an LLM. Complementing the enrichment techniques, we also study a pruning technique, i.e., a similarity-based type filtering technique that narrows the candidate space in taxonomy discovery and non-taxonomic relation extraction to the most semantically relevant types. When applied individually, each technique boosts precision–recall metrics over the vanilla setting where an LLM is trained on the original data. However, applied together they yield the best scores in five out of the seven ontology–task combinations, showing synergetic benefits. Our findings show that careful curation of inputs can itself yield substantial performance improvements. Codebase and all training artifacts are available at our GitHub repository: https://github.com/AFigaro/LLMs4OL_2025/tree/main
Ryan Roche, Kathryn Gray, Jaimie Murdock, and Douglas C. Crowder. ELLMO at LLMs4OL 2025 Tasks A and D: LLM-Based Term, Type, and Relationship Extraction
Abstract: This paper presents an approach to build ontologies using Large Language Models (LLMs), addressing the concern in many domains for quality knowledge data extraction from vast stores of text data. In particular, we focus on extracting terms and types from text and discovering relationships between types. This work was done with the 2025 LLMs4OL Challenge where quality training and testing data, as well as several defined tasks were provided. Many teams competed to produce the best output data across many domains. Our methodology involved prompt engineering, classification, clustering, and using vector databases. For the first task, discovering terms and types, we used two methods, (1) directly tailoring prompts to find the terms and types separately and (2) an approach that found both terms and types and classified them afterwards. For discovering relationships, we used clustering and vector databases to attempt to reduce the number of potential edges, then we queried the LLM for probabilities for each of the potential edges. While our findings indicate promising results, further work is necessary to address challenges related to processing large datasets, particularly in optimizing efficiency and accuracy.
Pankaj Goyal, Sumit Singh, and Uma Shanker Tiwary. silp_nlp at LLMs4OL 2025 Tasks A, B, C, and D: Clustering-Based Ontology Learning Using LLMs
Abstract: This paper presents the participation of the silp_nlp team in the LLMs4OL 2025 Challenge, where we addressed four core tasks in ontology learning: Text2Onto (Task A), Term Typing (Task B), Taxonomy Discovery (Task C), and Non-Taxonomic Relation Extraction (Task D). Building on our experience from the first edition, we proposed a clustering-enhanced methodology grounded in large language models (LLMs), integrating domain-adapted transformer models such as pranav-s/MaterialsBERT, dmis-lab/biobert-v1.1, and proprietary LLMs from Grok. Our framework combined lexical and semantic clustering with adaptive prompting to tackle entity and type extraction, semantic classification, hierarchical structure discovery, and complex relation modeling. Experimental results across 18 subtasks highlight the strength of our approach, particularly in blind and zero-shot scenarios. Notably, our model achieved multiple first-rank scores in taxonomy discovery and non-taxonomic relation extraction subtasks, validating the efficacy of clustering when coupled with semantically specialized LLMs. This work demonstrates that clustering-driven, LLM-based approaches can advance robust and scalable ontology learning across diverse domains.
Miquel Canal, José Ignacio Abreu and Yoan Gutiérrez. SEMA at LLMs4OL 2025 Task C: Prompt-Decoupled Fine-Tuning on MatOnto with LLaMA
Abstract: This paper presents our participation in Task C (Relation Extraction) of the LLMs4OL 2025 Challenge, which investigates the ability of Large Language Models (LLMs) to identify semantic and taxonomic relations between ontology types. Focusing on the MatOnto subtask—selected for its manageable size—we explore the performance of open-source models under resource constraints. We fine-tune LLaMA 3.1–8B using LoRA adapters and evaluate various strategies including contrastive negative sampling, prompt inversion, and system prompt variation. Inspired by recent findings on prompt sensitivity, we adopt a cross-template setup where the model is trained with one prompt format and tested with another semantically equivalent variant. Our experiments suggest that prompt-decoupling can improve generalization and mitigate overfitting to specific phrasings. While our results are modest, they offer insights into the challenges of adapting LLMs to structured relation extraction tasks and highlight practical considerations for tuning under constrained resources.
Patipon Wiangnak, Thin Prabhong, Thiti Phuttaamart, Natthawut Kertkeidkachorn, and Kiyoaki Shirai. The DREAM-LLMs at LLMs4OL 2025 Task B: A Deliberation-based Reasoning Ensemble Approach with Multiple Large Language Models for Term Typing in Low-Resource Domains
Abstract: The LLMs4OL Challenge at ISWC 2025 aims to advance the integration of Large Language Models (LLMs) and Ontology Learning (OL) across four key tasks: (1) Text2Onto, (2) Term Typing, (3) Taxonomy Discovery, and (4) Non-Taxonomic Relation Extraction. Our work focuses on the Term Typing Prediction task, where prompting LLMs has shown strong potential. However, in low-resource domains, relying on a single LLM is often insufficient due to domain-specific knowledge gaps and limited exposure to specialized terminology, which can lead to inconsistent and biased predictions. To address this challenge, we propose DREAM-LLMs: a Deliberation-based Reasoning Ensemble Approach with Multiple Large Language Models. Our method begins by crafting few-shot prompts using training examples and querying four advanced LLMs independently—ChatGPT-4o, Claude Sonnet 4, DeepSeek-V3, and Gemini 2.5 Pro. Each model outputs a predicted label along with a brief justification. To reduce model-specific bias, we introduce a deliberation step, in which one LLM reviews the predictions and justifications from the other three to produce a final decision. We evaluate DREAM-LLMs on three low-resource domain datasets—OBI, MatOnto, and SWEET—using F1-score as the evaluation metric. The results—0.908 for OBI, 0.568 for MatOnto, and 0.593 for SWEET—demonstrate that our ensemble strategy significantly improves performance, highlighting the promise of collaborative LLM reasoning in low-resource environments.
Alireza Esmaeili Fridouni and Mahsa Sanaei. Phoenixes at LLMs4OL 2025 Task A: Ontology Learning with Large Language Models Reasoning
Abstract: Recent advances in large language models (LLMs) have demonstrated remarkable capabilities in various natural language understanding tasks, including Ontology Learning (OL), where they automatically or semi-automatically extract knowledge from unstructured data. This work presents our contribution to the LLMs4OL Challenge at the ISWC 2025 conference, focusing on Task A, which comprises two subtasks: term extraction (SubTask A1) and type extraction (SubTask A2). We evaluate three state-of-the-art LLMs — Qwen2.5-72B-Instruct, Mistral-Small-24B-Instruct-2501, and LLaMA-3.3-70B-Instruct — across three domain-specific datasets: Ecology, Scholarly, and Engineering. In this paper, we adopt a Chain-of-Thought (CoT) Few-Shot Prompting strategy to guide the models in identifying relevant domain terms and assigning their appropriate ontology types. CoT prompting enables LLMs to generate intermediate reasoning steps before producing final predictions, which is particularly beneficial for ontology learning tasks that require contextual reasoning beyond surface-level term matching. Model performance is evaluated using the official precision, recall, and F1-score metrics provided by the challenge organizers. The results reveal important insights into the strengths and limitations of LLMs in ontology learning tasks.
Rehenuma Ilman, Mehreen Rahman, and Samia Rahman. CUET Zenith at LLMs4OL 2025 Task C: Hybrid Embedding-LLM Architectures for Taxonomy Discovery
Abstract: Taxonomy discovery, the identification of hierarchical relationships within ontological structures, constitutes a foundational challenge in ontology learning. Our submission to the LLMs4OL 2025 challenge, employing hybrid architectures to address this task across both biomedical (Subtask C1: OBI) and general-purpose (Subtask C5: SchemaOrg) knowledge domains. For C1, we have integrated semantic clustering of Sentence-BERT embeddings with few-shot prompting using Qwen-3 (14B), enabling domain-specific hierarchy induction without task-specific fine-tuning. For C5, we have introduced a cascaded validation framework, harmonizing deep semantic representations from sentence transformer {all-mpnet-base-v2}, ensemble classification via XGBoost, and a hierarchical LLM-based reasoning pipeline utilizing TinyLlama and GPT-4o. To address inherent class imbalances, we have employed SMOTE-based augmentation and gated inference thresholds. Empirical results demonstrate that our hybrid methodology achieves competitive performance, confirming that the judicious integration of classical machine learning with large language models yields efficient and scalable solutions for ontology structure induction. Code implementations are publicly available.
Chavakan Yimmark and Teeradaj Racharak. T-GreC at LLMs4OL 2025 Task B: A Report on Term-Typing Task of OBI dataset using LLM with k-Nearest Neighbors
Abstract: This report presents an approach that combines large language models' (LLMs) embedding with k-nearest neighbors (k-NN) for the term-typing task on the OBI (Ontology for Biomedical Investigations) dataset. We investigate the effectiveness of transformer models namely PubMedBERT, BioBERT, DeBERTa-v3, and RoBERTa with k-NN classification using the embedding of each respective model. Our experimental results demonstrate that fine-tuned LLMs not only have the capability to do term typing on their own but also can learn effective embeddings that are exploitable by k-NN for solving the task, with RoBERTa achieving the highest F1 score of 0.827 and k-NN using embedding from the model with score of 0.862. The study reveals that embeddings from transformer models, when used as semantic representations for similarity-based method, improve classification accuracy in this specific case.