Adaptive Energy Regularization for Quadruped Locomotion

Boyuan Liang*1, Lingfeng Sun*1 , Xinghao Zhu*1, Bike Zhang1, Yixiao Wang, Ziyin Xiong1, Chenran Li1,

Koushil Sreenath1, Masayoshi Tomizuka1

*Equal Contribution, Ordered Alphabetically Only 1UC Berkeley

Abstract

In reinforcement learning for legged robot locomotion, crafting effective reward strategies is crucial. Predefined gait patterns and complex reward systems are widely used to stabilize policy training. Drawing from the natural locomotion behaviors of humans and animals, which adapt their gaits to minimize energy consumption, we investigate the impact of incorporating an energy-efficient reward term that prioritizes distance-averaged energy consumption into the reinforcement learning framework. Our findings demonstrate that this simple addition enables quadruped robots to autonomously select appropriate gaits—such as four-beat walking at lower speeds and trotting at higher speeds—without the need for explicit gait regularizations. Furthermore, we provide a guideline for tuning the weight of this energy-efficient reward, facilitating its application in real-world scenarios. The effectiveness of our approach is validated through simulations and on a real Unitree Go1 robot. This research highlights the potential of energy-centric reward functions to simplify and enhance the learning of adaptive and efficient locomotion in quadruped robots.

Automatic Gait Transition with Single Policy

The policy is trained using only single round of RL, but is able to select the most energy-efficient gait under different reference velocities.

Two-Beat Walking

Trotting

Fly Trotting

Ablation Comparisons

While energy regularization is not added, an inefficient boucing policy might be generated.

Cost of Transportation across different energy regularization weights.

Boucing gait will be generated without energy regularization

Terrain Clearance

Energy Regularization can also be deployed on terrain clearance, which analogously generate a natural and more energy efficient gait.

Simple Terrain

Tough Terrain

Generalizability to ANYmal-C

Such energy regularization can also be deployed to other quadruped robot platforms. We tested on ANYmal-C and compared to the Cost of Transportation of the original Legged-Gym policy.

CoT Comparison on legged gym

Gait Comparison on legged gym

Original Legged Gym policy walking at 2m/s. It shows high leg swing and consumes more energy.

Policy generated from energy regularization, which has low leg swing and consumes less energy.