The Role of Domain Randomization in Training Diffusion Policies for Whole-Body Humanoid Control
Abstract
Humanoids have the potential to be the ideal embodiment in environments designed for humans. Thanks to the structural similarity to the human body, they benefit from rich sources of demonstration data, e.g., collected via teleoperation, motion capture, or even using videos of humans performing tasks. However, distilling a policy from demonstrations is still a challenging problem. While Diffusion Policies (DPs) have shown impressive results in robotic manipulation, their applicability to locomotion and humanoid control remains under-explored. In this paper, we investigate how dataset diversity and size affect the performance of DPs for humanoid whole-body control. In a simulated IsaacGym environment, we generate synthetic demonstrations by training Adversarial Motion Prior (AMP) agents under various Domain Randomization (DR) conditions, and we compare DPs fitted to datasets of different size and diversity. Our findings show that, although DPs can achieve stable walking behavior, successful training of locomotion policies requires significantly larger and more diverse datasets compared to manipulation tasks, even in simple scenarios.
Key Contributions
One of the first ablation studies on DR's impact in dataset generation for training DPs in humanoid control
Analysis of dataset size effects on training across different randomization techniques
Novel insights into data requirements for whole-body control versus manipulation tasks
Results
Key Findings
DR is essential across all dataset sizes, even when evaluating in non-randomized environments.
Successful training requires significantly larger datasets (2M-8M samples) compared to manipulation tasks.
Perturbation and terrain randomization demonstrated the strongest outcomes, matching source RL policy performance.
Performance in Non-randomized Environment
Dataset Size Impact:
500K samples: No stable walking achieved
2M samples: Success with specific DR configurations
8M samples: Most randomizations produce strong results
Performance in Randomized Environment
Dataset Size Impact:
500K samples: No configuration achieves stable walking
2M samples: Perturbations yield strong results
8M samples: Perturbation and terrain randomization show best performance
Performance of the source RL policy in randomized environment:
Performance of the best Diffusion Policy in randomized environment: