Adaptive Energy Regularization for Quadruped Locomotion

Boyuan Liang*1, Lingfeng Sun*1 , Xinghao Zhu*1, Bike Zhang1, Ziyin Xiong1, Chenran Li1

Koushil Sreenath1, Masayoshi Tomizuka1

*Equal Contribution, Ordered Alphabetically Only            1UC Berkeley

[Paper] [Appendix] [Code(coming soon)]

Abstract

In reinforcement learning for legged robot locomotion, crafting effective reward strategies is crucial. Pre-defined gait patterns and complex reward systems are widely used to stabilize policy training. Drawing from the natural locomotion behaviors of humans and animals, which adapt their gaits to minimize energy consumption, we propose a simplified, energy-centric reward strategy to foster the development of energy-efficient locomotion across various speeds in quadruped robots. By implementing an adaptive energy reward function and adjusting the weights based on velocity, we demonstrate that our approach enables ANYmal-C and Unitree Go1 robots to autonomously select appropriate gaits—such as four-beat walking at lower speeds and trotting at higher speeds—resulting in improved energy efficiency and stable velocity tracking compared to previous methods using complex reward designs and prior gait knowledge. The effectiveness of our policy is validated through simulations in the IsaacGym simulation environment and on real robots, demonstrating its potential to facilitate stable and adaptive locomotion.

Automatic Gait Transition with Single Policy

The policy is trained using only single round of RL, but is able to select the most energy-efficient gait under different reference velocities.

Ablation Comparisons

While energy regularization is not added, an inefficient or even undeployable policy might be generated.

Hardware Deployment of our Policy on Go1