Kaixi Bao, Chenhao Li, Yarden As, Andreas Krause, Marco Hutter
ETH Zurich, Switzerland
Training reinforcement learning policies for legged locomotion often requires extensive environment interactions, which are costly and time-consuming. We propose Symmetry-Guided Memory Augmentation (SGMA), a framework that improves training efficiency by combining structured experience augmentation with memory-based context inference. Our method leverages robot and task symmetries to generate additional, physically consistent training experiences without requiring extra interactions. To avoid the pitfalls of naive augmentation, we extend these transformations to the policy’s memory states, enabling the agent to retain task-relevant context and adapt its behavior accordingly. We evaluate the approach on quadruped and humanoid robots in simulation, as well as on a real quadruped platform. Across diverse locomotion tasks involving joint failures and payload variations, our method achieves efficient policy training while maintaining robust performance, demonstrating a practical route toward data-efficient reinforcement learning for legged robots.
SGMA leverages the inherent symmetries of the robot and task to generate additional, physically consistent training experiences, thereby overcoming the high interaction costs and sample inefficiency of traditional randomization. Additionally, the framework incorporates a memory module, which infers latent task context from the agent's interaction history. This enables the policy to adapt its behavior across tasks, thereby addressing the common pitfall of context-unaware policies that tend to adopt over-conservative strategies in partially observable environments.
Robust RL locomotion often relies on domain randomization, forcing agents to experience every task variation -- even though many are simply symmetric copies. If a policy learns a left-leg failure, mirroring the trajectory gives a right-leg failure. SGMA turns symmetry into free training data without new interactions.
Symmetry-based augmentation alone isn't enough. In partially observable tasks, naive augmentation causes policies to lose context and become overly conservative. SGMA addresses this with memory. It augments not only observations and actions, but also the policy's hidden states, preserving task context for both seen and augmented tasks.
SGMA learns more efficiently than policies trained with full task randomization, achieving comparable performance on both directly seen and augmented tasks.
Across 8 locomotion experiments, SGMA matches the performance of full task randomization without requiring additional interactions. In contrast, naive augmentation with feedforward policies (SGA-MLP) degrades performance on directly seen tasks due to the lack of context awareness.
We evaluate goal tracking under a joint failure at test time.
The baseline policy (SGA-MLP) exhibits over-conservative behavior -- small steps, reduced joint usage -- and often fails to reach the goal. SGMA instead adapts by reorientating and sidestepping to compensate for the failed joint, maintaining stable, goal-directed locomotion. We attribute this effective adaptation to memory-based context inference.
We attach the imeplementation details of our experiments below.
@article{bao2025toward,
title={Toward task generalization via memory augmentation in meta-reinforcement learning},
author={Bao, Kaixi and Li, Chenhao and As, Yarden and Krause, Andreas and Hutter, Marco},
journal={arXiv preprint arXiv:2502.01521},
year={2025}
}