Главная страница

The Role of Domain Randomization in Training Diffusion Policies for Whole-Body Humanoid Control

Abstract

Humanoids have the potential to be the ideal embodiment in environments designed for humans. Thanks to the structural similarity to the human body, they benefit from rich sources of demonstration data, e.g., collected via teleoperation, motion capture, or even using videos of humans performing tasks. However, distilling a policy from demonstrations is still a challenging problem. While Diffusion Policies (DPs) have shown impressive results in robotic manipulation, their applicability to locomotion and humanoid control remains under-explored. In this paper, we investigate how dataset diversity and size affect the performance of DPs for humanoid whole-body control. In a simulated IsaacGym environment, we generate synthetic demonstrations by training Adversarial Motion Prior (AMP) agents under various Domain Randomization (DR) conditions, and we compare DPs fitted to datasets of different size and diversity. Our findings show that, although DPs can achieve stable walking behavior, successful training of locomotion policies requires significantly larger and more diverse datasets compared to manipulation tasks, even in simple scenarios.

Key Contributions

One of the first ablation studies on DR's impact in dataset generation for training DPs in humanoid control
Analysis of dataset size effects on training across different randomization techniques
Novel insights into data requirements for whole-body control versus manipulation tasks

Results

Key Findings

DR is essential across all dataset sizes, even when evaluating in non-randomized environments.
Successful training requires significantly larger datasets (2M-8M samples) compared to manipulation tasks.
Perturbation and terrain randomization demonstrated the strongest outcomes, matching source RL policy performance.

Performance in Non-randomized Environment

Dataset Size Impact:

500K samples: No stable walking achieved
2M samples: Success with specific DR configurations
8M samples: Most randomizations produce strong results

Performance in Randomized Environment

Dataset Size Impact:

500K samples: No configuration achieves stable walking
2M samples: Perturbations yield strong results
8M samples: Perturbation and terrain randomization show best performance

Performance of the source RL policy in randomized environment:

Performance of the best Diffusion Policy in randomized environment: