Overview. Our proposed learning formulation separates task specification, operational limits, gait preference, and terrain adaptation; which enables emergent natural, low-COT locomotion and zero-shot transfer to a physical robot, while removing all explicit gait priors.
LoComposition decomposes quadrupedal locomotion learning into rewards for the task, constraints for operational limits, energy minimization for gait preference, and perception for terrain adaptation. No hand-crafted gait priors are required.
One mechanism per function, not one giant reward. Instead of entangling task tracking, safety limits, gait style, and terrain handling in a single complex reward, each gets its own dedicated mechanism in the learning formulation.
No gait priors, yet natural gaits emerge. We remove air-time, contact-count, and foot-clearance targets entirely — energy minimization alone steers the policy toward efficient, structured gaits.
Perception makes efficiency terrain-compatible. Exteroceptive sensing lets the policy spend energy selectively: economical on flat ground, expending more only where the terrain demands it.
Every component matters. Ablations show each piece removes a distinct failure mode: dropping constraints breaks deployability, dropping energy minimization degrades gait quality, dropping perception breaks rough-terrain efficiency.
Lower Energy, Fewer Limit Violations, Zero-Shot Transfer. Compared to a standard complex-reward baseline: 56% lower cost of transport, 96% fewer operational-limit violations, and direct sim-to-real transfer to a Unitree Go2 with LiDAR-based elevation mapping.
Abstract: Learning-based quadrupedal locomotion typically relies on complex reward formulations that entangle task specification, operational limits, gait preference, and terrain adaptation within a single optimization objective. We instead treat these functions through distinct mechanisms: rewards for task specification, constraints for operational limits, energy minimization for gait preference, and exteroceptive perception for adapting energy use to terrain difficulty. We show that these components jointly enable efficient, terrain-adaptive locomotion, and that removing each component exposes a distinct failure mode. Our formulation removes explicit gait priors (including air-time, contact-count, and foot-clearance targets) in favor of emergent behavior. Compared to a conventional complex-reward baseline, our formulation achieves comparable terrain traversal while reducing cost of transport by 56% and operational-limit violations by 96%. The resulting policies transfer zero-shot to a physical Unitree Go2 using LiDAR-based elevation mapping.