Machine learning theory has long focused on classical supervised learning settings, where a model is trained on input–label pairs drawn from a well-defined data distribution, with the aim of achieving low test error on such distribution. Despite remarkable advances in such settings, recent breakthroughs in generative AI have transformed our understanding of generalization, revealing phenomena such as emergent capabilities and in-context learning, which lie beyond the scope of existing theoretical frameworks. These empirical developments call for new theoretical paradigms, fostering closer interactions between theoreticians and practitioners to address the distinctive challenges posed by generative models.
The purpose of this workshop is to bring together diverse theory-oriented communities to articulate and synthesize core principles underlying modern generative AI, and to outline the central challenges in advancing our scientific understanding.
Date: December 7th, 2025
Location: Bella Center, Copenhagen, Denmark (Entrance 4, Amphi 15)
In this talk, I will review how concepts from optimal transport can be applied to analyze seemingly unrelated machine learning methods for sampling and training neural networks. The focus is on using optimal transport to study dynamical flows in the space of probability distributions. The first example will be sampling by flow matching, which regresses advection fields. In its simplest case (diffusion models), this approach exhibits a gradient structure similar to the displacement seen in optimal transport. I will then discuss Wasserstein gradient flows, where the flow minimizes a functional within the optimal transport geometry. This framework can be employed to model and understand the training dynamics of the probability distribution of neurons in two-layer networks. The final example will explore modeling the evolution of the probability distribution of tokens in deep transformers. This requires modifying the optimal transport structure to accommodate the softmax normalization inherent in attention mechanisms.
09:30--10:00: Contributed talks:
Claudia Merger: Generalization Dynamics of Linear Diffusion Models
Yu-Han Wu: Optimal Stopping in Latent Diffusion Models
10:00--11:00: Break/Poster Session 1 (see detailed planning)
11:00--11:45: Plenary talk: Elizabeth Baker, Conditioning stochastic differential equations with score matching methods
Conditioning stochastic differential equations (SDEs), e.g. via Doob’s h-transform, is an important problem, not just within generative modelling but also in many scientific areas. For example, evolutionary biology models the temporal evolution of shapes of species via SDEs, and conditioning these SDEs thus blends mathematical models with observational data. In this talk, I will connect h-transforms to score-based diffusion models and discuss how to adapt score learning to conditioning SDEs, covering both finite- and infinite-dimensional settings. In other words, score-matching can be a powerful tool for SDEs, beyond its use in generative modelling, and this talks surveys the nuances of this relationship.
11:45--12:30: Plenary talk: Anej Svete, Diffusion Language Models: Problem Solving and Reasoning
Masked diffusion models (MDMs) offer a compelling alternative to traditional autoregressive language models. They generate strings by iteratively refining partially masked inputs in parallel. This makes them efficient, but their computational capabilities and the limitations inherent to the parallel generation process remain largely unexplored. In this talk, I will talk about what types of reasoning problems MDMs can provably solve and how efficiently they can do it. We will describe the relationship between MDMs and the well-understood reasoning frameworks of chain of thought (CoT) and padded looped transformers (LTs): We will see that MDMs and polynomially padded LTs are, in fact, equivalent, and that MDMs can solve all problems that CoT-augmented transformers can. Moreover, we will showcase classes of problems (including regular languages) for which MDMs are inherently more efficient than CoT transformers, where parallel generation allows for substantially faster reasoning.
(12:30--13:30) Lunch break
Poster size: A0 portrait or A1 landscape
Please see the openreview page for the list of accepted papers.
SPIGM@NeurIPS: We welcome all authors of accepted papers at the SPIGM@NeurIPS workshop who are unable to travel to the US to join us and present their work during the poster sessions.
Contact: prigm-eurips-2025@googlegroups.com