Diffusion-Based Control for Humanoid Tracking
The deployment of general-purpose humanoid robots requires control policies capable of executing a vast repertoire of dynamic motions with stability and realism. Traditional Reinforcement Learning (RL) often produces specialized "expert" policies that struggle to generalize across diverse tasks. This project addresses this limitation by researching and developing a diffusion model designed to distill multiple RL expert policies into a single, robust controller. The objective is to enable the Unitree G1 humanoid to track a wide range of realistic motion trajectories, bridging the gap between simulation training and real-world agility.
To achieve a cohesive policy, I first established a robust training foundation in IsaacGym. I configured the RL environment, defining specific reward functions and observation spaces to train expert policies using the PPO algorithm. Recognizing the computational demands of diffusion training, I architected a high-efficiency data collection pipeline on the SEC server. I implemented a batched parallel processing script to distribute the workload across multiple GPUs, ensuring the rapid accumulation of state-action pair trajectories required for the distillation process. Furthermore, I addressed critical motion retargeting issues where kinematic solvers caused ground clipping by devising a numerical solution to detect and offset these motions along the z-axis, ensuring simulation stability.
The initial expert policies have demonstrated strong performance, successfully tracking 86% of the motions in our database. To prepare for real-world deployment, I conducted rigorous inference-time benchmarking on the robot's onboard computer, the NVIDIA Jetson Orin. By comparing deployment formats, I determined that the TorchScript (.jit) format runs 0.5 ms faster than ONNX, identifying it as the optimal choice for our high-frequency control loop.
This project establishes a scalable framework for advanced humanoid control, combining the precision of expert RL policies with the generalization capabilities of diffusion models. My contributions—ranging from the core environment design and parallel data pipeline to low-level inference optimization—have laid the groundwork for a unified policy capable of complex real-world locomotion. The next phase focuses on resolving final kinematic constraints and executing a large-scale data collection campaign to finalize the diffusion model training.