LFSD 2025

International Workshop on Learning from Small Data (LFSD 2025)

Co-Located With ECML-PKDD 2025

September 15th or 19th, 2025

Photo by Acediscovery. License: CC-BY-4.0

Call for Papers

As machine learning continues to advance, the challenge of learning from small datasets has emerged as a critical area of research. In many real-world scenarios, obtaining large volumes of high-quality training data is often impractical or impossible. This limitation can stem from various factors, such as constraints in data collection, the need for early predictions in emerging domains, the absence of historical data for novel problems, or situations where available data originates from disparate tasks or contexts. These challenges underscore the importance of developing innovative approaches that enhance the entire lifecycle of machine learning models.

This workshop aims to explore innovative methodologies for effectively utilizing small data across various domains. A central focus will be on developing strategies and frameworks to overcome the challenges posed by limited data availability. We will examine how incorporating domain knowledge—such as principles from physics or other scientific fields—can guide model development, enabling robust learning even in data-scarce environments. Further, we will highlight advanced techniques, including data augmentation, semi-supervised learning, pseudo-labeling, active learning, few-shot learning, self-supervised learning, in-context learning, transfer learning, and other related topics, demonstrating their potential to improve learning outcomes when data is limited. Additionally, the workshop will address the role of experimental design within the broader research landscape.

By gathering researchers and practitioners from multiple fields, this workshop seeks to foster collaboration and cross-pollination of ideas. We encourage contributions that introduce novel problems, propose innovative solutions, or share practical experiences related to the small data challenge. Together, we will advance the understanding and application of learning from small datasets, paving the way for more robust and adaptable machine learning systems.

Workshop Topics

This half-day workshop aims to address the challenges and opportunities of learning from small data, focusing on methodologies that optimize the entire learning pipeline under data-scarce conditions. The workshop will bring together researchers and practitioners from diverse disciplines, including physics-informed machine learning, few-shot learning, in-context learning, data augmentation, active learning, semi-supervised learning, transfer learning, and experimental design and optimization. Additionally, we will explore the role of surrogate models, which reduce the need for extensive data collection by approximating expensive-to-evaluate functions, making them particularly useful in scenarios with limited data availability. These approaches, among other solutions for learning in data-scarce environments, will serve as the foundation for sharing innovative techniques, discussing best practices, and exploring the frontiers of learning from small data. By emphasizing techniques that minimize reliance on large data, the workshop seeks to inspire advancements in a range of fields where data is inherently limited or costly to obtain.

We invite contributions that introduce novel problem settings, propose groundbreaking methodologies, or share practical experiences in deploying systems designed to operate effectively with small data. Submissions that highlight interdisciplinary approaches or address domain-specific challenges in small data learning are particularly encouraged. Additionally, we welcome thoughtprovoking questions or critical perspectives that aim to spark discussion and push the boundaries of this important research area. In particular, we invite contributions that address aspects including, but not limited to:

Physics and Math-Informed Machine Learning Techniques

Methods for employing physical laws to enhance learning with limited data
Methods for applying physics-based constraints including boundary conditions and loss functions in the learning process
Methods to handle ODEs and PDEs to effectively model dynamic physical systems
Integrate control systems within frameworks to optimize performance and stability
Apply techniques in fields like fluid dynamics and climate modeling for practical solutions

Transfer Learning, Data Augmentation, Generating Synthetic Data, Few-Shot Learning, Active Learning, and In-Context Learning

Methods for adapting pre-trained models to improve performance with limited data
Techniques for applying data augmentation to artificially expand dataset size and diversity
Strategies for few-shot learning to enable effective learning with minimal labeled examples
Meta-learning frameworks to improve few-shot learning performance across diverse tasks
Methods for generating high-quality synthetic data to enrich existing datasets
Strategies for active learning to prioritize the most informative data points for labeling
Novel strategies of active learning that use low query budget
New methods for applying active learning in high dimensional spaces (e.g., deep active learning)
Selectively acquire features to maximize information gain from small datasets
Implement strategies for effective label acquisition in multi-label active learning
Techniques for in-context learning to adapt models dynamically based on provided contextual information

Semi-Supervised, Self-Supervised, Weak, and Federated Learning

Methods that use both labeled and unlabeled data to improve model performance in data-scarce environments
Generating pseudo-labels from model predictions to enhance learning from unlabeled data
Combining weak classifiers to create robust and generalizable ensemble models
Training models collaboratively across distributed devices while preserving data privacy

Challenges and Advances in Learning from Data-Scarce Environments

Methods capable of handling high-dimensional data efficiently in data-scarce environments
New human-in-the-loop learning and application scenarios, e.g., brain-computer interfaces and crowdsourcing Techniques for adaptively learning from continuous data streams
Strategies for real-world deployment in dynamic environments
Methods for handling the imbalanced data, data drift, noisy labeled data problems
Developing scalable frameworks for large-scale data streams

Design of Experiments and Surrogate Models in Small Data Contexts

Methods for optimizing experimental design under limited data
Integrating prior knowledge and constraints for efficiency
Adaptive experimental design that adjusts dynamically
Selecting the most informative experimental conditions
Leveraging surrogate models (e.g., Gaussian Processes, Bayesian Optimization) to reduce data collection needs
Applications of surrogate-assisted design in experiments, simulations, and optimization

Important Dates:

Submission open: May, 15, 2025

Paper Submission Deadline: Saturday, June 14, 2025

Author Notification: Monday, July 14, 2025 (ECML-PKDD offers the early bird registration rate until TBA)

Camera Ready: 05.09.2025

Workshop: Friday, September 19, 2025 [At least one author of each accepted paper must be registered.]

Submission Instructions

We invite submissions of original work in the form of regular papers (8–16 pages) or extended abstracts (2–4 pages). All submissions will undergo a double-blind peer-review process, and accepted papers will be presented and discussed at the workshop. Extended abstracts are particularly suited for works-in-progress or industrial case studies. At least one author of each accepted paper is required to register for the conference and attend the workshop.

We plan to publish the workshop proceedings with Springer, submissions will be made via CMT system of the ECML PKDD conference and the papers should be formatted in the (Lecture Notes in Computer Science ) LNCS style. Springer’s proceedings LaTeX templates are also available in Overleaf.

Reviews are double-blind; papers must not include information that reveal the authors' identities.

Program of the Workshop

TBA

Committee

Organizing Committee (alphabetical, first name):

Alaa Tharwat: Bielefeld University of Applied Sciences and Arts (HSBI), Germany
Barbara Hammer: Bielefeld University, Germany
Bjarne Jaster: Bielefeld University, Germany
Markus Lange-Hegermann: OWL University of Applied Sciences and Arts, Lemgo, Germany
Michiel Straat: Bielefeld University, Germany
Wolfram Schenck: Bielefeld University of Applied Sciences and Arts (HSBI), Germany

Steering Committee:

Thorben Markmann: Bielefeld University, Germany

Daniel Leite: Paderborn University, Germany

Tarek Gaber: University of Salford, UK

Program Committee:

Essam Rashed: University of Hyogo, Japan

Andreas Rosskopf: Fraunhofer Institute for Integrated Systems and Device Technology IISB, Germany

Junker, Annika: Bielefeld University of Applied Sciences and Arts (HSBI), Germany

Samir Moustafa: University of Vienna, Austria

Mohammed Fellaji: CentraleSup´elec, Universit´e Paris-Saclay, CNRS, LORIA, France

Peter Kuchling: Bielefeld University of Applied Sciences and Arts (HSBI), Germany

Andreas Besginow: OWL University of Applied Sciences and Arts, Lemgo, Germany

Christoph Berganski : Paderborn University , Paderborn, Egypt

Dina Elmanakhly: Suez canal university, Egypt

Hans Harder: Paderborn University, Paderborn, Germany

Jörn Tebbe: OWL University of Applied Sciences and Arts, Lemgo, Germany

Mona Selim: Suez canal university, Egypt

Neevkumar Hareshbhai Manavar: Hochschule Bielefeld, Bielefeld, Germany

Sameh Basha: Faculty of Science - Cairo university, Cairo Egypt

Program (tentative):

Friday, September 19th 2025, Porto, Portugal

Time Program Presenter / Author

9:00 - 9:05 Opening and Welcome

9:05 - 9:30 Masked Autoencoder Self-Pretraining for Defect Detection in

Microelectronics (Ref:211) Nikolai Röhrich

9:30 - 9:55 Learning from Less: Synthetic Clinical Data Augmentation for

Predicting Cardiac Decompensation and Pulmonary Exacerbation (Ref:56) Pedro Matias

9:55 - 10:20 Data Augmentation Using Diffusion Models with Geometric Pattern Masks

for Industrial Defect Detection (Ref:142) Masaya Oirase

10:20 - 11:20 First Poster Session + Coffee Break

11:20 - 11:45 Physics-Informed Diffusion Models for Unsupervised Anomaly Detection in

Multivariate Time Series (Ref:256) Juhi Soni

11:45 - 12:10 Improving Neural Network-Based Material Simulations with Domain-Specific

Data Filtering and Atom-Specific Training (Ref:151) Meguru Yamazaki

12:10 - 12:35 Evaluating Spatiotemporal Prediction Models in a Low-Data Regime (Ref:153) Andrzej Dulny

12:35 - 13:00 Label Augmentation with Reinforced Labeling for Weak Supervision (Ref:170) Fabio Maresca

13:00 - 14:00 Lunch Break

14:00 - 14:25 Should We Still Let Random Sampling Guide Model Performance ? Investigating

Exemplar Selection for Few-Shot Named-Entity Recognition (Ref:174) Quentin TELNOFF

14:25 - 14:50 Tailored Transformation Invariance for Industrial Anomaly Detection (Ref:179) Mariette Schönfeld

14:50 - 15:15 Knowledge Distillation Framework for Accelerating High-Accuracy Neural

Network-Based Molecular Dynamics Simulations (Ref:145) Naoki Matsumura

15:15 - 15:40 Varying Informativeness of Inductive Bias in Gaussian Processes Regression

for Small Data (Ref:237) Andreas Besginow

15:40 - 16:05 Active Learning for cheap RUL Prediction in CMAPSS Dataset (Ref:244) Esmaail Albarazi

16:05 - 16:30 Learning local and global prototypes with optimal transport for unsupervised

anomaly detection and localization (Ref: 255)+ Closing Robin Trombetta

Instructions for Authors

LNCS Style

The paper must be written in English and submitted as a PDF file in LNCS format.
You can download the LaTeX template or edit the template directly in Overleaf.

Presentation

All accepted papers will be presented in spotlight talks and/or poster sessions.
At least one author of each accepted paper must be registered for ECML-PKDD.
ECML-PKDD offers early bird registration until Tuesday, July 22nd, 2025.

Indexed Publishing

All accepted papers will be published in the ECML-PKDD joint Post-Workshop proceeding, which is indexed, e.g., by Google Scholar. Reviews are double-blind; papers must not include information that reveal the authors' identities.

Dual Submission Policy

Submissions should report original work. Submissions that are identical or substantially similar to papers that have been published, have been submitted elsewhere, or are submitted elsewhere during the review period, will be rejected.

Camera Ready

Camera-ready papers will be collected after the conference. We will collect everything together: the camera-ready version, source file and license to publish. We will send instructions how to submit these files.

Workshop (Full Day)

The workshop will take place on September 19th, 2025, co-located with ECML-PKDD 2025, the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery (please, check the program, this will be updated later).
At least one author of each accepted paper must be registered.

For any further questions, please refer to the conference guidelines or contact us.

Page updated

Google Sites

Report abuse