ICLR 2025 Self-Improving Foundation Models

Scaling Self-Improving Foundation Models

Garnet 214--215

Date: April 27, 2025 (Sunday)

Overview

The availability of internet data, while vast, is ultimately finite or at least growing at a pace that lags behind the consumption needs of foundation models (FMs) during pre-training. Perhaps as is most evident with large language models (LLMs), even today, the projected gains from scaling up pre-training on internet data are smaller than incorporating specific test-time techniques. It is projected that soon we will run out of high-quality data, worthy enough to be directly trained on via next-token prediction. Similarly, real robot data in embodied or physical intelligence problems tends to be quite limited to date. All is to say that as FMs scale in size and capability, we will soon hit a "data'' bottleneck blocking progress. To address this, machine learning techniques that enable models to self-improve, i.e., continually improve beyond their initial training data become essential. In theory, this can be done by training on self-generated or synthetic data that the same (or other models) produce.

The unique challenges of self-improvement as a learning paradigm. The paradigm of training on self-generated synthetic data, or what we refer to as self-improvement, is distinct from standard supervised and reinforcement learning (RL) in several critical ways as we discuss next. These differences underscore the need for a dedicated study of these topics. In supervised learning, models are trained on high-quality annotations from humans. Moreover, for pre-training of LLMs, high-quality data is often curated in heuristic ways that are largely independent of the learning algorithm. In contrast, self-improvement frameworks rely on the model’s ability to generate its own training data (or use other models to generate this data), and thus the algorithm for data curation must now be subsumed by the learning framework. RL also involves training on model’s generations, and as a result, might appear similar to the self-improvement paradigm. However, due to its generality, a generic RL algorithm (designed to cater to all downstream RL problems) might not be tailored enough for self-improvement, which poses specific constraints and conditions on improving models. For instance, in contrast to an unpredictable external environment, the only randomness in the data generation process for self-improving foundation models in many use cases corresponds to the inherent randomness in the model's own outputs. Furthermore, RL algorithms are typically meant to optimize rewards obtained from an accurate reward oracle, which is absent in the self-improvement paradigm. Here, we can only rely on querying learned verifiers or reward models which can fail arbitrarily. In fact, unless carefully designed, self-improvement recipes can lead to model collapse with more training, which is absent in traditional RL due to the presence of a meaningful reward signal. Thus, different from RL, the self-improvement algorithms cannot naively exploit the verification-generation gap. This necessitates research on self-improvement algorithms that also adapt to errors made by the learned evaluation model. We believe that such distinctions and specificity should provide far more optimistic and tailored algorithms that are more effective than a generic RL approach.

Connections to safety and alignment: In addition, we would like to clarify that this workshop is also interested in understanding self-improvement principles for advancing safety and alignment (e.g., weak to strong generalization, multi-agent debate, etc.), as well as the implications of existing self-improvement techniques on safety and alignment of these models (e.g., how can we understand behavior evolving through self-improvement training, theoretical guarantees on reliability of self-improvement training, alleviating value misalignment during self-improvement training, etc.).

We realize that powerful AI models will have societal and economic implications, and are committed to encouraging the use of self-improvement methods responsibly. A part of the workshop to serve as a venue to discuss the implications of these methods for self-improvement to train models. We are also interested in understanding how self-improvement methods should be built responsibly, what testing criteria to use to understand the behavior of these methods, and how to integrate safety and alignment as primary objectives when developing self improvement methods.

Ethics Statement

We are committed to fostering responsible research and discussions around self-improvement that prioritize safety, transparency, and societal well-being. We expect most research discussions around the machine learning principles behind self-improvement methods to enhance our understanding of self-improvement as a community, which should hopefully more avenues to tackle long-term catastrophic risks posed by these methods due to an improved understanding of how they operate, where they break, where misalignment is likely to happen. We believe these discussions should not pose any immediate risks and will help the community with opening the black-box of self-improvement due to a better understanding.

We think safety is also a core capability that self-improvement (as a community) must study, and will encourage workshop participants to discuss safety and ethical risks openly, and propose mitigation strategies to guide the responsible development of self-improving foundation models. This workshop will provide a place for both capabilities and safety researchers to chime into an open discussion.

Goal of the workshop

This workshop focuses on developing machine learning principles and algorithms for enabling self-improvement in foundation models. We aim to bring together communities working on foundation models, reinforcement learning and online learning, cognitive neuroscience, along with practitioners from various domains for fostering discussions and collaborations on several fundamental topics around this general theme of self-improvement, including but not limited to:

Learning objectives and algorithms; what should we learn? How should we supervise training?
Multi-agent and multi-model systems for enabling self-improvement
Training on machine-generated synthetic data without collapse
Autonomous online learning and reinforcement learning algorithms for FMs
Efficiently exploiting tools and external information for self-improvement
Theoretically characterizing conditions under which self-improvement is feasible, e.g., verification-generation gap, nature of problems where self-improvement is possible,
Using weak supervision for improving strong models
Gains from training with self-improvement algorithms at inference time (e.g., computational benefits, performance benefits, etc.)
Limits of self-improvement training (e.g., when is expert data often needed?)
Self-improvement for alignment and safety (synthetic data, test-time compute, weak-to-strong generalization)
Applications: software agents, robotic self-improvement, multi-modal systems, math, etc.

We are especially interested in downstream application of self-improvement algorithms. We explicitly encourage submissions that study applications of these algorithms on downstream problem domains. The composition of our speaker and organizer set covers different application areas of interest.

Invited Speakers/Panelists