International Workshop on Learning from Small Data (LFSD 2025)
Co-Located With ECML-PKDD 2025
September 15th or 19th, 2025
Photo by Acediscovery. License: CC-BY-4.0
International Workshop on Learning from Small Data (LFSD 2025)
Co-Located With ECML-PKDD 2025
September 15th or 19th, 2025
Photo by Acediscovery. License: CC-BY-4.0
As machine learning continues to advance, the challenge of learning from small datasets has emerged as a critical area of research. In many real-world scenarios, obtaining large volumes of high-quality training data is often impractical or impossible. This limitation can stem from various factors, such as constraints in data collection, the need for early predictions in emerging domains, the absence of historical data for novel problems, or situations where available data originates from disparate tasks or contexts. These challenges underscore the importance of developing innovative approaches that enhance the entire lifecycle of machine learning models.
This workshop aims to explore innovative methodologies for effectively utilizing small data across various domains. A central focus will be on developing strategies and frameworks to overcome the challenges posed by limited data availability. We will examine how incorporating domain knowledge—such as principles from physics or other scientific fields—can guide model development, enabling robust learning even in data-scarce environments. Further, we will highlight advanced techniques, including data augmentation, semi-supervised learning, pseudo-labeling, active learning, few-shot learning, self-supervised learning, in-context learning, transfer learning, and other related topics, demonstrating their potential to improve learning outcomes when data is limited. Additionally, the workshop will address the role of experimental design within the broader research landscape.
By gathering researchers and practitioners from multiple fields, this workshop seeks to foster collaboration and cross-pollination of ideas. We encourage contributions that introduce novel problems, propose innovative solutions, or share practical experiences related to the small data challenge. Together, we will advance the understanding and application of learning from small datasets, paving the way for more robust and adaptable machine learning systems.
Workshop Topics
This half-day workshop aims to address the challenges and opportunities of learning from small data, focusing on methodologies that optimize the entire learning pipeline under data-scarce conditions. The workshop will bring together researchers and practitioners from diverse disciplines, including physics-informed machine learning, few-shot learning, in-context learning, data augmentation, active learning, semi-supervised learning, transfer learning, and experimental design and optimization. Additionally, we will explore the role of surrogate models, which reduce the need for extensive data collection by approximating expensive-to-evaluate functions, making them particularly useful in scenarios with limited data availability. These approaches, among other solutions for learning in data-scarce environments, will serve as the foundation for sharing innovative techniques, discussing best practices, and exploring the frontiers of learning from small data. By emphasizing techniques that minimize reliance on large data, the workshop seeks to inspire advancements in a range of fields where data is inherently limited or costly to obtain.
We invite contributions that introduce novel problem settings, propose groundbreaking methodologies, or share practical experiences in deploying systems designed to operate effectively with small data. Submissions that highlight interdisciplinary approaches or address domain-specific challenges in small data learning are particularly encouraged. Additionally, we welcome thoughtprovoking questions or critical perspectives that aim to spark discussion and push the boundaries of this important research area. In particular, we invite contributions that address aspects including, but not limited to:
Physics and Math-Informed Machine Learning Techniques
Methods for employing physical laws to enhance learning with limited data
Methods for applying physics-based constraints including boundary conditions and loss functions in the learning process
Methods to handle ODEs and PDEs to effectively model dynamic physical systems
Integrate control systems within frameworks to optimize performance and stability
Apply techniques in fields like fluid dynamics and climate modeling for practical solutions
Transfer Learning, Data Augmentation, Generating Synthetic Data, Few-Shot Learning, Active Learning, and In-Context Learning
Methods for adapting pre-trained models to improve performance with limited data
Techniques for applying data augmentation to artificially expand dataset size and diversity
Strategies for few-shot learning to enable effective learning with minimal labeled examples
Meta-learning frameworks to improve few-shot learning performance across diverse tasks
Methods for generating high-quality synthetic data to enrich existing datasets
Strategies for active learning to prioritize the most informative data points for labeling
Novel strategies of active learning that use low query budget
New methods for applying active learning in high dimensional spaces (e.g., deep active learning)
Selectively acquire features to maximize information gain from small datasets
Implement strategies for effective label acquisition in multi-label active learning
Techniques for in-context learning to adapt models dynamically based on provided contextual information
Semi-Supervised, Self-Supervised, Weak, and Federated Learning
Methods that use both labeled and unlabeled data to improve model performance in data-scarce environments
Generating pseudo-labels from model predictions to enhance learning from unlabeled data
Combining weak classifiers to create robust and generalizable ensemble models
Training models collaboratively across distributed devices while preserving data privacy
Challenges and Advances in Learning from Data-Scarce Environments
Methods capable of handling high-dimensional data efficiently in data-scarce environments
New human-in-the-loop learning and application scenarios, e.g., brain-computer interfaces and crowdsourcing Techniques for adaptively learning from continuous data streams
Strategies for real-world deployment in dynamic environments
Methods for handling the imbalanced data, data drift, noisy labeled data problems
Developing scalable frameworks for large-scale data streams
Design of Experiments and Surrogate Models in Small Data Contexts
Methods for optimizing experimental design under limited data
Integrating prior knowledge and constraints for efficiency
Adaptive experimental design that adjusts dynamically
Selecting the most informative experimental conditions
Leveraging surrogate models (e.g., Gaussian Processes, Bayesian Optimization) to reduce data collection needs
Applications of surrogate-assisted design in experiments, simulations, and optimization
Important Dates:
Submission open: May, 15, 2025
Paper Submission Deadline: Saturday, June 14, 2025
Author Notification: Monday, July 14, 2025 (ECML-PKDD offers the early bird registration rate until TBA)
Camera Ready: 05.09.2025
Workshop: Friday, September 19, 2025 [At least one author of each accepted paper must be registered.]
Submission Instructions
We invite submissions of original work in the form of regular papers (8–16 pages) or extended abstracts (2–4 pages). All submissions will undergo a double-blind peer-review process, and accepted papers will be presented and discussed at the workshop. Extended abstracts are particularly suited for works-in-progress or industrial case studies. At least one author of each accepted paper is required to register for the conference and attend the workshop.
We plan to publish the workshop proceedings with Springer, submissions will be made via CMT system of the ECML PKDD conference and the papers should be formatted in the (Lecture Notes in Computer Science ) LNCS style. Springer’s proceedings LaTeX templates are also available in Overleaf.
Reviews are double-blind; papers must not include information that reveal the authors' identities.
Program of the Workshop
TBA
Committee
Organizing Committee (alphabetical, first name):
Alaa Tharwat: Bielefeld University of Applied Sciences and Arts (HSBI), Germany
Barbara Hammer: Bielefeld University, Germany
Bjarne Jaster: Bielefeld University, Germany
Markus Lange-Hegermann: OWL University of Applied Sciences and Arts, Lemgo, Germany
Michiel Straat: Bielefeld University, Germany
Wolfram Schenck: Bielefeld University of Applied Sciences and Arts (HSBI), Germany
Steering Committee:
Thorben Markmann: Bielefeld University, Germany
Daniel Leite: Paderborn University, Germany
Tarek Gaber: University of Salford, UK
Program Committee:
Essam Rashed: University of Hyogo, Japan
Andreas Rosskopf: Fraunhofer Institute for Integrated Systems and Device Technology IISB, Germany
Junker, Annika: Bielefeld University of Applied Sciences and Arts (HSBI), Germany
Samir Moustafa: University of Vienna, Austria
Mohammed Fellaji: CentraleSup´elec, Universit´e Paris-Saclay, CNRS, LORIA, France
Peter Kuchling: Bielefeld University of Applied Sciences and Arts (HSBI), Germany
Andreas Besginow: OWL University of Applied Sciences and Arts, Lemgo, Germany
Christoph Berganski : Paderborn University , Paderborn, Egypt
Dina Elmanakhly: Suez canal university, Egypt
Hans Harder: Paderborn University, Paderborn, Germany
Jörn Tebbe: OWL University of Applied Sciences and Arts, Lemgo, Germany
Mona Selim: Suez canal university, Egypt
Neevkumar Hareshbhai Manavar: Hochschule Bielefeld, Bielefeld, Germany
Sameh Basha: Faculty of Science - Cairo university, Cairo Egypt
Time Program Presenter / Author
9:00 - 9:05 Opening and Welcome
9:05 - 9:30 Masked Autoencoder Self-Pretraining for Defect Detection in
Microelectronics (Ref:211) Nikolai Röhrich
9:30 - 9:55 Learning from Less: Synthetic Clinical Data Augmentation for
Predicting Cardiac Decompensation and Pulmonary Exacerbation (Ref:56) Pedro Matias
9:55 - 10:20 Data Augmentation Using Diffusion Models with Geometric Pattern Masks
for Industrial Defect Detection (Ref:142) Masaya Oirase
10:20 - 11:20 First Poster Session + Coffee Break
11:20 - 11:45 Physics-Informed Diffusion Models for Unsupervised Anomaly Detection in
Multivariate Time Series (Ref:256) Juhi Soni
11:45 - 12:10 Improving Neural Network-Based Material Simulations with Domain-Specific
Data Filtering and Atom-Specific Training (Ref:151) Meguru Yamazaki
12:10 - 12:35 Evaluating Spatiotemporal Prediction Models in a Low-Data Regime (Ref:153) Andrzej Dulny
12:35 - 13:00 Label Augmentation with Reinforced Labeling for Weak Supervision (Ref:170) Fabio Maresca
13:00 - 14:00 Lunch Break
14:00 - 14:25 Should We Still Let Random Sampling Guide Model Performance ? Investigating
Exemplar Selection for Few-Shot Named-Entity Recognition (Ref:174) Quentin TELNOFF
14:25 - 14:50 Tailored Transformation Invariance for Industrial Anomaly Detection (Ref:179) Mariette Schönfeld
14:50 - 15:15 Knowledge Distillation Framework for Accelerating High-Accuracy Neural
Network-Based Molecular Dynamics Simulations (Ref:145) Naoki Matsumura
15:15 - 15:40 Varying Informativeness of Inductive Bias in Gaussian Processes Regression
for Small Data (Ref:237) Andreas Besginow
15:40 - 16:05 Active Learning for cheap RUL Prediction in CMAPSS Dataset (Ref:244) Esmaail Albarazi
16:05 - 16:30 Learning local and global prototypes with optimal transport for unsupervised
anomaly detection and localization (Ref: 255)+ Closing Robin Trombetta
LNCS Style
The paper must be written in English and submitted as a PDF file in LNCS format.
You can download the LaTeX template or edit the template directly in Overleaf.
Presentation
All accepted papers will be presented in spotlight talks and/or poster sessions.
At least one author of each accepted paper must be registered for ECML-PKDD.
ECML-PKDD offers early bird registration until Tuesday, July 22nd, 2025.
Indexed Publishing
All accepted papers will be published in the ECML-PKDD joint Post-Workshop proceeding, which is indexed, e.g., by Google Scholar. Reviews are double-blind; papers must not include information that reveal the authors' identities.
Dual Submission Policy
Submissions should report original work. Submissions that are identical or substantially similar to papers that have been published, have been submitted elsewhere, or are submitted elsewhere during the review period, will be rejected.
Camera Ready
Camera-ready papers will be collected after the conference. We will collect everything together: the camera-ready version, source file and license to publish. We will send instructions how to submit these files.
Workshop (Full Day)
The workshop will take place on September 19th, 2025, co-located with ECML-PKDD 2025, the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery (please, check the program, this will be updated later).
At least one author of each accepted paper must be registered.
For any further questions, please refer to the conference guidelines or contact us.