What does it mean for a generative model to truly generate? Despite their widespread use, there is no rigorous answer to this question: current definitions of generalization in generative AI are either qualitative or task-specific, and do not transfer across architectures.
I address this gap using the statistical physics of disordered systems, where the tools to define and locate phase transitions analytically are well developed. My central claim is that the memorization-to-generalization transition — the sharp threshold at which a model develops attractors near unseen examples — provides a definition that is geometric, quantitative, and architecture-independent.
Associative memories are models of creativity
A Hopfield network stores data as minima of an energy function, where each memory is a basin of attraction. Classically, it recalls only what it stored, and additional minima are deemed "spurious" and undesirable. In Negri et al. 2023 and Kalaj et al. 2024 we showed that when the stored examples share underlying features, the network also forms "spurious" but desirable attractors for examples it was never shown. We called this the memorization-to-generalization phase transition, and we successfully predicted it with spin-glass arguments.
Basins of attraction do not require an energy function
Explicit energy landscapes are restrictive. An important step was recognising that basins of attraction can be defined through conditional likelihood instead, with couplings inferred directly from data via pseudo-likelihood. In D'Amico et al. 2025 we showed that removing the energy constraints from the learning process allows to learn associative memories with a generalization phase even on real data. More importantly, abandoning energy opened up the application of the memorization-to-generalization to state-of-the-art generative architectures.
Language diffusion models are creative associative memories
In Pham et. al 2026, we interpreted discrete diffusion language models as associative memories whose basins are formed by conditional likelihood maximization. The same memorization-to-generalization transition reappears: as data grows, basins around training examples contract while basins around unseen examples expand, until the two coincide. Surprisingly, the diffusion model starts generating good samples around at the memorization-to-generalization transition, suggesting a quantitative and geometric definition for this class of models.
What's next?
The results so far concern non-autoregressive models, where the uniform diffusion process makes the associative memory structure more evident. The natural next step — autoregressive transformers — brings a qualitatively different set of conceptual difficulties.
Another central open question is whether the memorization-to-generalization transition survives in this setting and whether its critical threshold can be predicted analytically. To approach this task, we are developing extensions of classical physics of learning arguments to vector-spin variables, to model the interactions between tokens in their embedding space (Nicoletti et al. 2025).