Learning-Based Control for Dynamic Humanoid Locomotion
Learning-Based Control for Dynamic Humanoid Locomotion
Goal of this project
Dynamic locomotion is a complex control problem as it requires both robustness and accuracy. Reinforcement learning (RL) has been widely adopted to address model uncertainties and environmental discrepancies, achieving robust locomotion. However, the black-box nature of neural networks hampers the debugging process. It often requires tedious reward shaping and environment randomization based on intuition. In addition, sparse reward signals resulting from uneven terrain, where the available safe stepping area is limited, make it difficult for random exploration to find optimal trajectories. In this regard, reference-guided RL could be an effective alternative to mitigate these challenges. Specifically, trajectory optimization (TO) provides dynamically feasible reference trajectories without the need for retargeting. Moreover, a planned trajectory that considers terrain information can serve as an effective motion primitive. Future work will explore the seamless integration of trajectory optimization with reinforcement learning, aiming to enhance robustness and accuracy for perceptive locomotion.
Trajectory optimization
Reinforcement learning
Sim to sim validation
Why hybrid control framwork?
Preliminary Trials
1) Foot step planner guided reinforcement learing with momentum rewards
Implementation of Angular Momentum-based Linear Inverted Pendulum (ALIP) in learning framework (Isaac lab)
Results
Advantages
Capable of controlling walking parameters such as step time and width
Enables prediction of future motions based on planner outputs (optimized foot placement and center of mass)
Disdvantages
contact sequence and timing are fixed
Additional rewards are required for human-like Upper-body motions
Future work
Trajectory Optimization with RL: Imitating dynamically feasible trajectories is expected to help RL agents efficiently explore meaningful contact sequences and joint trajectories within the limited simulation time for precise and robust locomotion.
Highly aligned works
Reiter, Rudolf, et al. "Synthesis of model predictive control and reinforcement learning: Survey and classification." arXiv preprint arXiv:2502.02133 (2025).
Kamohara, Junnosuke, et al. "RL-augmented Adaptive Model Predictive Control for Bipedal Locomotion over Challenging Terrain." arXiv preprint arXiv:2509.18466 (2025).
Jeon, Se Hwan, et al. "Residual MPC: Blending Reinforcement Learning with GPU-Parallelized Model Predictive Control." arXiv preprint arXiv:2510.12717 (2025).
Kim, Hyeongjun, et al. "High-speed control and navigation for quadrupedal robots on complex and discrete terrain." Science Robotics 10.102 (2025): eads6192.
Wang, Renjie, et al. "Integrating Trajectory Optimization and Reinforcement Learning for Quadrupedal Jumping with Terrain-Adaptive Landing." arXiv preprint arXiv:2509.12776 (2025).
Cheng, Jin, et al. "RAMBO: RL-augmented Model-based Whole-body Control for Locomanipulation." IEEE Robotics and Automation Letters (2025).
Liu, Fukang, et al. "Opt2skill: Imitating dynamically-feasible whole-body trajectories for versatile humanoid loco-manipulation." arXiv preprint arXiv:2409.20514 (2024).
Hoeller, David, et al. "Anymal parkour: Learning agile navigation for quadrupedal robots." Science Robotics 9.88 (2024): eadi7566.
Jenelten, Fabian, et al. "Dtc: Deep tracking control." Science Robotics 9.86 (2024): eadh5401.
Marew, Daniel, et al. "A biomechanics-inspired approach to soccer kicking for humanoid robots." 2024 IEEE RAS 23rd International Conference on Humanoid Robots (Humanoids). IEEE, 2024.
Bellegarda, Guillaume, Chuong Nguyen, and Quan Nguyen. "Robust quadruped jumping via deep reinforcement learning." Robotics and Autonomous Systems 182 (2024): 104799.
Sleiman, Jean-Pierre, Farbod Farshidian, and Marco Hutter. "Versatile multicontact planning and control for legged loco-manipulation." Science Robotics 8.81 (2023): eadg5014.
Bogdanovic, Miroslav, Majid Khadiv, and Ludovic Righetti. "Model-free reinforcement learning for robust locomotion using demonstrations from trajectory optimization." Frontiers in Robotics and AI 9 (2022): 854212.
Fuchioka, Yuni, Zhaoming Xie, and Michiel Van de Panne. "Opt-mimic: Imitation of optimized trajectories for dynamic quadruped behaviors." arXiv preprint arXiv:2210.01247 (2022).