The widespread adoption of large language models (LLMs) has enabled major advancements in Knowledge Extraction (KE), understanding, and reasoning over unstructured large-scale data. KE has emerged as a make-or-break bottleneck in the the real-world adoption of large language models. The substantial computational requirements of state-of-the-art LLMs impede their scalability and practical deployment, particularly in resource-constrained environments, such as enterprise systems with strict latency requirements. For instance, finance, healthcare, legal technology, and web-scale analytics all need systems that pull structured facts from noisy and heterogeneous data while respecting tight latency and memory budgets, and the today's bar of 10-B models is rarely met when high quality is at stake. The 1st Workshop on Small and Efficient LLMs for Knowledge Extraction (SmaLLEXT) focuses on the development and application of small and efficient LLMs for effective knowledge and information extraction.
Recent progress in model compression, quantization, pruning, retrieval-augmented generation (RAG), and efficient fine-tuning has shown that smaller LLMs can achieve competitive performance on a variety of downstream tasks. However, efforts around scalable KE remain fragmented in sub-fields such as NLP, information retrieval, and knowledge representation.
This workshop closes this gap by bringing the efficiency and extraction sub-communities together at CIKM to focus on data-centric, application-driven methods development. The workshop will focus on industrial and business applications in the real world, where large-scale data are abundant but often noisy, dynamic, and difficult to process reliably. It is designed to convene researchers and practitioners from academia and industry to examine strategies for compressing, distilling, specializing, and accelerating LLM while maintaining extraction accuracy and robustness against hallucination. The workshop will explore how these models can support structured data extraction from diverse formats encountered on the Web, in enterprise data stores, or across multimodal documents.
Design of compact transformer variants
Sparse and modular network
Lightweight language models
Quantization, pruning, distillation, and low-rank adaptation methods
Hardware-aware model optimization
Real-time acceleration techniques
Zero-shot vs. few-shot templates
Chain-of-thought prompt design
Iterative prompt refinement
Instruction phrasing best practices
Retrieval-augmented generation (RAG)
Symbolic reasoning to support distilled LLMs
Lightweight memory-augmented models
Domain adaptation and continual fine-tuning
Parameter-efficient tuning strategies for adaptation
Transfer learning approaches for specialized domains
Trade-offs in model capacity for multi-schema coverage
Domain-aware adaptive sparsity patterns
Hybrid breadth-depth pipelines (specialist modules and generalist core)
Evaluation protocols for depth vs. breadth
Unstructured text (e.g., named entity recognition, relation extraction, entity linking)
Semi-structured data (e.g., tables, forms, web pages)
Multimodal data (e.g., images, PDFs, charts, and scansion)
Multilingual transfer and cross-lingual prompting
Data augmentation for low-resource languages
Benchmarks and datasets for evaluating the models
Interpretability and explainability
Challenges of measuring faithfulness and detecting hallucinations
Crowd-sourced annotation with guidelines
LLM-assisted active learning loops
Weak supervision via heuristic labeling
Annotation quality control metrics
Industry experience in building KE pipelines
Real-world deployments
Energy-efficient training and deployment
Drift detection in extracted knowledge over time
Canary deployments and A/B testing
Defense against prompt injections
Data-poisoning mitigation
Robustness to private data memorization (or extraction)
Fairness and bias in specialized small models
Privacy-preserving characteristics of small LLMs in KE
Memorization of public datasets/information
All deadlines are at 11:59pm in the Anywhere on Earth (AoE) time zone.
Paper submission deadline: August 29, 2025 Extended to September 1, 2025
Paper acceptance notification: September 26, 2025
Paper camera-ready: October 31, 2025
Workshop date: November 14, 2025
All submissions must be PDFs formatted in the Standard ACM Conference Proceedings Template as for the main conference.
The workshop invites three submission types:
Long Papers: 8 pages excluding references,
Short Papers: 4 pages excluding references,
Industry Papers: 4 pages excluding references.
Reviews will be double-blinded.
Long and short papers will be assessed based on their quality, impact, novelty, depth, clarity, and generalizability. The industry papers might be focused on challenges and practical solutions to significant real-world issues faced by industry practitioners and we expect papers to not release industrial datasets. For each accepted paper, at least one author must attend the workshop and present the paper or poster. Submitting papers that are identical (or substantially similar) to versions that have been published, accepted for publication, or submitted in parallel to other conferences (or any venue with published proceedings) is not allowed.
For further details, refer to the Submission Guidelines of the main conference.
Data Science Manager
at Cognism
Principal Data Scientist
at Cognism
Postdoctoral Researcher
at Université Paris-Saclay
Co-founder and CTO of distil labs
Lecturer in NLP at the School of Informatics, University of Edinburgh
Co-founder and CTO of Miniml.AI