Learning-based Whole-body Control for Contact-Ritch Behaviors
Period : 25.03 ~ Present
Learning-based Whole-body Control for Contact-Ritch Behaviors
Period : 25.03 ~ Present
Goal of this project
This study explores approaches that combine model-based methods with reinforcement learning to enable humanoid robots to overcome modeling uncertainties and perform locomotion, loco-manipulation, and contact-rich behaviors in real-world environments. Reinforcement learning has proven effective for replicating robust, human-like walking, though it is less suitable for tasks requiring precise loco-manipulation. Extending to loco-manipulation, joint torque and force references improve the precision of RL-based controllers but require additional planning to manage contact sequences. For more complex contact-rich behaviors, contact-implicit trajectory optimization (CITO) provides dynamically feasible solutions, yet its implementation is hindered by high computational load and sensitivity to initialization. To address these challenges, Future work will explore the seamless integration of trajectory optimization with reinforcement learning, aiming to enhance robustness and scalability in multi-contact scenarios that demand precise and contact-rich motions.
Trajectory optimization
Crocoddyle
Reinforcement learning
Isaaclab
sim to sim validation
Mujoco
Research Background
Preliminary Trials
1) Foot step planner guided reinforcement learing with momentum rewards
Implementation of Angular Momentum-based Linear Inverted Pendulum (ALIP) in learning framework (Isaac lab)
Results
Advantages
Capable of controlling walking parameters such as step time and width
Enables prediction of future motions based on planner outputs (optimized foot placement and center of mass)
Disdvantages
contact sequence is requried for asymmetric locomotion
Additional rewards are required for human-like Upper-body motions
Future work
RL for Real-Time Trajectory Optimization: Using reinforcement learning (RL) to provide initialization for trajectory optimization (TO) is expected to yield dynamically feasible solutions in multi-contact scenarios that require high accuracy and complex contact sequences.
Trajectory Optimization for RL: Imitating dynamically feasible trajectories, including joint torques and contact forces, is expected to improve RL agents' precision and physical consistency in contact-rich behaviors.
Reference
Gibson, Grant, et al. "Terrain-adaptive, alip-based bipedal locomotion controller via model predictive control and virtual constraints." 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2022.
Lee, Ho Jae, Seungwoo Hong, and Sangbae Kim. "Integrating model-based footstep planning with model-free reinforcement learning for dynamic legged locomotion." 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2024
Lee, Ho Jae, Se Hwan Jeon, and Sangbae Kim. "Learning Humanoid Arm Motion via Centroidal Momentum Regularized Multi-Agent Reinforcement Learning." arXiv preprint arXiv:2507.04140 (2025).
Escontrela, Alejandro, et al. "Adversarial motion priors make good substitutes for complex reward functions." 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2022.
Cheng, Jin, et al. "RAMBO: RL-augmented Model-based Whole-body Control for Locomanipulation." IEEE Robotics and Automation Letters (2025).
Marew, Daniel, et al. "A biomechanics-inspired approach to soccer kicking for humanoid robots." 2024 IEEE RAS 23rd International Conference on Humanoid Robots (Humanoids). IEEE, 2024.
Liu, Fukang, et al. "Opt2skill: Imitating dynamically-feasible whole-body trajectories for versatile humanoid loco-manipulation." arXiv preprint arXiv:2409.20514 (2024).
Kim, Gijeong, et al. "Contact-implicit Model Predictive Control: Controlling diverse quadruped motions without pre-planned contact modes or trajectories." The International Journal of Robotics Research 44.3 (2025): 486-510.
Mordatch, Igor, Emanuel Todorov, and Zoran Popović. "Discovery of complex behaviors through contactinvariant optimization." ACM Transactions on Graphics (ToG) 31.4 (2012): 1-8.