Anonymous Authors
Abstract
Recent advancements in legged robot locomotion have facilitated traversal over increasingly complex terrains. Despite this progress, many existing approaches rely on end-to-end deep reinforcement learning (DRL), which poses limitations in terms of safety and interpretability, especially when generalizing to novel terrains. To overcome these challenges, we introduce VOCALoco, a modular skill-selection framework that dynamically adapts locomotion strategies based on perceptual input. Given a set of pre-trained locomotion policies, VOCALoco evaluates their viability and energy-consumption by predicting both the safety of execution and the anticipated cost of transport over a fixed planning horizon. This joint assessment enables the selection of policies that are both safe and energy-efficient, given the observed local terrain. We evaluate our approach on staircase locomotion tasks, demonstrating its performance in both simulated and real-world scenarios using a quadrupedal robot. Empirical results show that VOCALoco achieves improved robustness and safety during stair ascent and descent compared to a conventional end-to-end DRL policy.Â
Overview of VOCALoco. Given a heightmap of the local terrain, two high-level modules predict: (i) the viability of executing each skill and (ii) the Cost of Transport (CoT), which estimates energy expenditure over a short fixed horizon. With both predictions at hand, we first filter unsafe skills. Then, among the safe skills, we select the skill with the lowest predicted energy expenditure as the final policy to execute on the robot. The three example images show the ANYmal-D robot switching between different policies depending on the terrain type.
In VOCALoco, we start by training low-level locomotion policies: a walking policy, an ascending policy, and a descending policy. Then, we perform rollouts with these policies, collecting data that will train the high-level policies: the viability and the CoT modules.
Reward weights used in the paper