Arxiv (PDF) | Code (coming soon)
Automatic Curriculum Learning Through Imagined Rollouts
We demonstrate how IMAC (Imagined Autocurricula) progressively extends planning horizons during training across six Procgen environments. Each row shows the same game at three training stages: early (short), mid (medium), and late (long). The agent begins with short-horizon, reactive behavior and gradually learns to plan over longer time horizons—from short to long steps—without manual tuning. This automatic curriculum emerges through Prioritized Level Replay applied to imagined rollouts from a diffusion-based world model trained on offline data. The progression from simple to complex scenarios occurs naturally as the system prioritizes states with higher learning potential, enabling strong generalization to unseen test levels.
Abstract
Training agents to act in embodied environments typically requires vast training data or access to accurate simulation, neither of which exists for many cases in the real world. Instead, world models are emerging as an alternative–leveraging offline, passively collected data, they make it possible to generate diverse worlds for training agents in simulation. In this work, we harness world models to gen-erate “imagined” environments to train robust agents capable of generalizing to novel task variations. One of the challenges in doing this is ensuring the agent trains on useful generated data. We thus propose a novel approach IMAC (Imagined Autocurricula) leveraging Unsupervised Environment Design (UED), which induces an automatic curriculum over generated worlds. In a series of challeng-ing, procedurally generated environments, we show it is possible to achieve strong transfer performance on held-out environments having trained only inside a world model learned from a narrower dataset. We believe this opens the path to utilizing larger-scale, foundation world models for generally capable agents.
Approach
Our method employs a three-stage pipeline that combines offline data collection, world model learning, and imaginative agent training through Prioritized Level Replay (PLR).
In the first stage, we perform diverse data collection by executing a behavioral policy πθ across varied environments to gather comprehensive state-action-reward-next state tuples, which are stored in an offline dataset buffer.
The second stage involves training a learned world model consisting of two key components: a state transition denoiser that captures environment dynamics and a reward/termination predictor that estimates task outcomes. These models are trained on the collected dataset and subsequently frozen to ensure consistency during agent training.
In the final stage, we implement an imaginative autocurriculum using PLR, where the agent learns by balancing between sampling from existing experiences in the PLR buffer and generating new synthetic rollouts through the learned world model.
This approach enables the agent to continuously update its experience buffer with both real and imagined trajectories, allowing for efficient exploration and learning without requiring constant interaction with the actual environment. The PLR mechanism prioritizes training on challenging scenarios while maintaining diversity, leading to robust policy learning across a wide range of task difficulties.
@inproceedings{güzel2025imac,
title={IMAC: Imagined Autocurricula},
author={Ahmet Hamdi Güzel, Matthew T. Jackson, Jarek L. Nıelsen, Tim Rocktäschel, Jakob Foerster, Ilija Bogunovic, Jack Parker-Holder},
year={2025},
booktitle={Conference on Neural Information Processing Systems (NeurIPS)},
url={https://arxiv.org/abs/2509.13341},
abstract={We introduce IMAC, a breakthrough approach that uses world models to generate diverse training environments with automatic curriculum learning, enabling agents to master new tasks they've never seen before—opening the door to truly general-purpose AI trained entirely in simulation.}
}