Ada-Diffuser learns the causal generative process of decision-making (planning and policy learning) from demonstrations, through latent identification and the causal diffusion model. It is empirically effective in accurate latent inference, long-horizon planning, and adaptive policy learning, with theoretical guarantees for identifying the latent process.
Abstract
Recent work has framed decision-making as a sequence modeling problem using generative models such as diffusion models. Although promising, these approaches often overlook latent factors that exhibit evolving dynamics, elements that are fundamental to environment transitions, reward structures, and high-level agent behavior. Explicitly modeling these hidden processes is essential for both precise dynamics modeling and effective decision-making. In this paper, we propose a unified framework that explicitly incorporates latent dynamics inference into generative decision-making from minimal yet sufficient observations. We theoretically show that under mild conditions, the latent process can be identified from small temporal blocks of observations. Building on this insight, we introduce Ada-Diffuser, a causal diffusion model that learns the temporal structure of observed interactions and the underlying latent dynamics simultaneously and leverages them for planning and control. With a proper modular design, Ada-Diffuser supports both planning and policy learning tasks, enabling adaptation to latent variations in dynamics, rewards, and even recovering hidden action variables from action-free demonstrations. Extensive experiments on locomotion and robotic manipulation benchmarks demonstrate the model’s effectiveness in accurate latent inference, long-horizon planning, and adaptive policy learning.
Learning Framework
Ada-Diffuser is a general learning framework that can be adapted to various scenarios, including planning and policy learning, with flexible diffusion-based input and output designs.
Causal Diffusion Model: learning the underlying generative process underlying the demonstrations by "denoise-and-refine" mechanism in denoising and "zig-zag" sampling process (§4.3)
Autoregressive Denoising:
Denoising in an autoregressive manner: larger time steps correspond to higher noise levels, aligning the denoising direction with causal uncertainty, gradually refining from noisy futures to clearer pasts.
Modeling the Latent Process inspired by Identification Theory (§3.2):
> We identify temporally coherent latent variables in a block-wise fashion, enabling structured modeling of the underlying generative process.
> As the theoretical analysis shows that current and future observations are essential for accurate latent identification (§3.2). During denoising, we adopt a denoise-and-refine strategy, ensuring that posterior-based inference leverages cleaner observations than prior-based generation. During sampling, we alternate between latent inference and observation refinement in a zig-zag manner, effectively integrating information from both directions in time.
Experiments
We conduct experiments across a diverse set of domains, including locomotion and continuous control (Cheetah, Walker, Ant), navigation (Maze2D), and robotic manipulation (Franka-Kitchen, Robomimic, LIBERO). Our evaluation covers both planning and policy learning settings, with and without explicitly defined latent processes—such as latent states, rewards, or actions. Notably, even without explicit latent variables, our model consistently improves decision-making, particularly in long-horizon tasks (§5, Appendix §D.1, F, I).
Learning with Explicit Latents
Group I: We consider the latent dynamics (wind speed) and latent rewards on Cheetah and Ant tasks.
Identification Results on Cheetah with latent dynamics
Verify our identification theory by varying the number of observation steps:
> 1–4 steps: Insufficient observations lead to poor latent identification, reflected by higher linear probing errors (MSE) and lower R2R^2R2 scores.
> 5–20 steps: Sufficient observations enable accurate identification and improved performance.
> more than 20 steps: Performance declines due to redundant information and increased optimization difficulty.
We also observe a strong correlation between latent identification quality and downstream policy performance—better identification leads to better policy learning and higher rewards.
Full results on Policy Learning and Identification are in Tables 1-2, Tables §A3-A4, and Fig §A.5.
Gourp II: We consider learning the latent actions with latents on robotic manipulation tasks.
Learning without Explicit Latents
The latent identification process functions similarly to Bayesian filtering, enabling the model to capture the underlying stochasticity present in demonstration data. This leads to more accurate and efficient decision-making. As a result, Ada-Diffuser becomes a general and flexible framework for a wide range of planning and control tasks (see §5, last group, and Appendix §D.3).
Results on offline RL and planning benchmarks, full results are in Tables §A5-A8.
Ada-diffuser can enable long-horizon planning (planning horizon and action execution horizon) by modelling the causal generative process