CODE SHALL BE RELEASED ON ACCEPTANCE
With the increasing availability of open-source robotic data, imitation learning has become a promising approach for both manipulation and locomotion. Diffusion models are now widely used to train large, generalized policies that predict controls or trajectories, leveraging their ability to model multimodal action distributions. However, this generality comes at the cost of larger model sizes and slower inference, an acute limitation for robotic tasks requiring high control frequencies. Moreover, Diffusion Policy (DP), a popular trajectory-generation approach, suffers from a trade-off between performance and action horizon: fewer diffusion queries lead to larger trajectory chunks, which in turn accumulate tracking errors. To overcome these challenges, we introduce WARPD (World-model Assisted Reactive Policy Diffusion), a method that generates closed-loop policies (weights for neural policies) directly, instead of open-loop trajectories. By learning behavioral distributions in parameter space rather than trajectory space, WARPD offers two major advantages: (1) extended action horizons with robustness to perturbations, while maintaining high task performance, and (2) significantly reduced inference costs. Empirically, WARPD outperforms DP in long-horizon and perturbed environments, and achieves multitask performance on par with DP while requiring only ∼ 1/45th of the inference-time FLOPs per step.
The generated policy can run a closed loop for longer action horizons, thereby allowing for fewer diffusion model queries.
Below are demonstrations of WARPD trained on pusht trajectory data, at different action horizons
Action Horizon 64
Action Horizon 128
Action Horizon 246
Drawer Close
Drawer open
Button press
Peg insert side
Push
Pick and place
Window close
Door open
Reach
Window open
PushT task: WARPD with perturbation: 50
PushT task: DP with perturbation: 50
Can task: WARPD with perturbation: 3
Can task: DP with perturbation: 3
Lift task: WARPD with perturbation: 3
DP with perturbation: 3