The papers are publicly available on OpenReview: https://openreview.net/group?id=ICLR.cc/2025/Workshop/SSI-FM/Publication_Chairs&referrer=%5BHomepage%5D(%2F)#tab-accepted-submissions
(Oral) Can Language Models Falsify? The Need for Inverse Benchmarking
(Oral) Self-Improving Transformers Overcome Easy-to-Hard and Length Generalization Challenges
(Oral) A Self-Improving Coding Agent
(Oral) Demystifying Long Chain-of-Thought Reasoning in LLMs
(Oral) AgentBreeder: Mitigating the AI Safety Impact of Multi-Agent Scaffolds via Self-Improvement
(Oral) An Architecture Search Framework for Inference-Time Techniques
Kun: Answer Polishment for Chinese Self-Alignment with Instruction Back-Translation
NNetNav: Unsupervised Learning of Browser Agents Through Environment Interaction in the Wild
Assessing Diversity Collapse in Reasoning
MPAW: Multi-Preference Alignment through Weak Model Collaboration for Efficient and Flexible LLM Decoding
Understanding the Capabilities and Limitations of Weak-to-Strong Generalization
SCOPE: Improving LLM Conversations with Efficient Semantic Space Planning
InfoQuest: Evaluating Multi-Turn Dialogue Agents for Open-Ended Conversations with Hidden Context
Adaptively-Labeled Vision Datasets Via Instance-Level Retrieval
DISC: Dynamic Decomposition Improves LLM Inference Scaling
AlphaVerus: Bootstrapping Formally Verified Code Generation through Self-Improving Translation and Treefinement
Self-correction for OOD generalization
Exploring the Pre-conditions for Memory-Learning Agents
DetailCLIP: Detail-Oriented CLIP for Fine-Grained Tasks
LaMsS: When Large Language Models Meet Self-Skepticism
Think, Prune, Train, Improve: Scaling Reasoning Without Scaling Models
Aviary: Training Language Agents on Challenging Scientific Tasks
KernelBench: Can LLMs Write Efficient GPU Kernels?
Multi-Agent Verification: Scaling Test-Time Compute with Goal Verifiers
Great Models Think Alike and this Undermines AI Oversight
Moral Intrinsic Rewards for Automated Alignment of LLM Agents
Training a Generally Curious Agent
How to Mitigate Overfitting in Weak-to-strong Generalization?
Yes, Q-learning Helps Offline In-Context RL
MetaSC: Test-Time Safety Specification Optimization for Language Models
HEAL: Hierarchical Embedding Alignment Loss for Improved Retrieval and Representation Learning
Self-Taught Self-Correction for Small Language Models
MMKE-Bench: A Multimodal Editing Benchmark for Diverse Visual Knowledge
AMPO: Active Multi Preference Optimization for Self-play Preference Selection
Automated Capability Discovery via Model Self-Exploration
Evolutionary Prompt Optimization Discovers Emergent Multimodal Reasoning Strategies in Vision-Language Models
Towards Internet-Scale Training For Agents
Solving Robotic Tasks via Self-Adapting Improvement Loops with Internet Video Knowledge
VLM Q-Learning: Aligning Vision-Language Models for Interactive Decision-Making
An Adversarial Collaborative Framework for Comprehensive Image Captioning
AIDE: Agentically Improve Visual Language Model with Domain Experts
Bridging the Training-Inference Gap in LLMs by Leveraging Self-Generated Tokens
ReSL: Enhancing Deep Clustering Through Reset-based Self-Labeling
D3: A Large Dataset for Training Code Language Models to Act Diff-by-Diff
Algorithm Discovery With LLMs: Evolutionary Search Meets Reinforcement Learning
Vision-Language Model Dialog Games for Self-Improvement
I-SHEEP: Self-Alignment of LLM from Scratch through an Iterative Self-Enhancement Paradigm
Preference Tree Optimization: Enhancing Goal-Oriented Dialogue with Look-Ahead Simulations
Improving Test-Time Search for LLMs with Backtracking Against In-Context Value Verifiers
Multi-Turn Code Generation Through Single-Step Rewards
Value-Based Deep RL Scales Predictably
Game-Theoretic Regularized Self-Play Alignment of Large Language Models
MALT: Improving Reasoning with Multi-Agent LLM Training
Leveraging Reasoning with Guidelines to Elicit and Utilize Knowledge for Enhancing Safety Alignment
Don't Throw Away Data: Improving Sequence Knowledge Distillation with Minimum Bayes Risk Decoding
Evaluating LLMs Without Oracle Feedback: Agentic Annotation Evaluation Through Unsupervised Consistency Signals
Mitigating Short Board Effect via Dynamic Reward Balancing in Multi-reward LLM Optimization
Safety is Essential for Responsible Open-Ended Systems
Self-Improving Diffusion Models With Synthetic Data
Escaping Collapse: The Strength of Weak Data for Large Language Model Training
Multi-modal Agent Tuning: Building a VLM-Driven Agent for Efficient Tool Usage
OpenRAG: Optimizing RAG End-to-End via In-Context Retrieval Learning
Source2Synth: Synthetic Data Generation and Curation Grounded in Real Data Sources
RMBoost: Reward Model Training With Preference-Conditional Multi-Aspect Synthetic Data Generation
Scalable Thompson Sampling via Ensemble++
Policy-Agnostic RL: Offline RL and Online RL Fine-Tuning of Any Class and Backbone
Self-Correcting Self-Consuming Loops For Generative Model Training
Natural Language Reinforcement Learning
Scaling Flaws of Verifier-guided Search in Mathematical Reasoning
Optimizing Test-Time Compute via Meta Reinforcement Finetuning
Boss LLM: Adaptation via No-Regret Learning
Explain-Query-Test: Self-Evaluating LLMs Via Explanation and Comprehension Discrepancy
Scaling Inference-Time Search with Vision Value Model for Improved Visual Comprehension