Real-Time Gait Adaptation for Quadrupeds using Model Predictive Control and Reinforcement Learning

Ganga Nair*, Prakrut Kotecha*, Shishir Kolathaya

Abstract

Model-free reinforcement learning (RL) has enabled adaptable and agile quadruped locomotion; however, policies often converge to a single gait, leading to suboptimal performance. Traditionally, Model Predictive Control (MPC) has been extensively used to obtain task-specific optimal policies but lacks the ability to adapt to varying environments. To address this limitation, we propose an optimization framework for real-time gait adaptation in a continuous gait space, combining the Model Predictive Path Integral (MPPI) algorithm with a Dreamer module to produce adaptive and optimal policies for quadruped locomotion. At each time step, MPPI jointly optimizes the actions and gait variables using a Dreamer-derived reward that promotes velocity tracking, energy efficiency, stability, and smooth transitions, while penalizing abrupt gait changes. A learned value function is incorporated into the reward, extending the formulation to an infinite-horizon planner. We evaluate our framework in simulation on the Unitree Go1, demonstrating an average reduction of up to 36.48 % in energy consumption across varying target speeds, while maintaining accurate tracking and adaptive, task-appropriate gaits.

Motivation

In nature, animals effortlessly switch between gaits—walking, trotting, or galloping—based on terrain and task demands, optimizing for energy efficiency and stability. Quadrupedal robots, however, often rely on fixed or preprogrammed gaits that limit adaptability in dynamic environments.

Our research focuses on enabling continuous and real-time gait adaptation in quadrupeds through a unified optimization framework. Unlike traditional Model Predictive Control (MPC), which assumes fixed gait parameters and depends on accurate dynamics models, or Reinforcement Learning (RL), which often converges to a single gait, our method integrates the strengths of both.

We combine Model Predictive Path Integral (MPPI) control with a Dreamer-based module that learns dynamics, rewards, and policies from diverse gait data. This allows online optimization over both actions and gait parameters, achieving smooth transitions and improved energy efficiency across different locomotion tasks.

This work moves toward making quadrupedal locomotion more adaptive, efficient, and terrain-aware, bridging the gap between biological versatility and robotic control.

credits: gif via Laughing squid

Methodology

Training

Our framework employs an Actor-Critic structure trained with Proximal Policy Optimization (PPO), supported by two auxiliary modules: an Adaptation Module and a Dreamer Module. The adaptation module learns to infer privileged information from observation histories to assist control during deployment.

The Dreamer Module learns a world model comprising a policy, dynamics, reward, and value functions—trained in parallel with the RL policy. The cloned policy serves as a warm start for the MPPI planner, while the learned dynamics and reward models enable simulation-based planning and evaluation. This integration allows for efficient infinite-horizon optimization using value bootstrapping, enabling adaptive gait control and robust locomotion across conditions.

Deployment

During deployment, our system performs real-time optimization of control actions and gait parameters using the Model Predictive Path Integral (MPPI) method. This sampling-based approach allows for smooth adaptation to task objectives and physical constraints without requiring gradient information.

At each control step, a set of trajectory samples is generated using the Dreamer policy and learned dynamics model as a warm start. Each trajectory is evaluated with the learned reward and value functions, capturing performance measures such as energy efficiency and velocity tracking. The top-performing (elite) trajectories are then used to update the sampling distribution, iteratively refining the control and gait strategy.

This framework enables continuous, adaptive gait selection and robust control in real time, leveraging the Dreamer’s learned world model for efficient, infinite-horizon planning across diverse locomotion conditions.

Results

Ablation Study

To validate our approach, we analyze fixed gaits—trot, pace, bound, and pronk—on flat terrain by comparing velocity tracking accuracy and energy efficiency across speeds. The results show that while trotting achieves low tracking errors, pronking becomes more energy-efficient at higher speeds. No single gait performs optimally across all conditions, motivating the need for adaptive gait selection based on task demands.

Energy Consumption

We evaluated energy efficiency using the Cost of Transport (CoT), which measures power consumption relative to speed and weight. Our adaptive gait framework achieves consistently lower CoT—about 15–20% less than fixed-gait baselines—across all commanded velocities. By dynamically adjusting gaits to match speed and terrain, the method avoids inefficiencies seen in single-gait policies and enables smoother, more energy-efficient locomotion.

Gait Adaptation

The figure shows how the planner continuously adapts its gait as the commanded forward velocity increases. The robot smoothly tracks the target velocity while maintaining cyclic and stable joint motions. Foot-contact visualizations reveal evolving interlimb coordination, demonstrating continuous gait transitions without relying on predefined gait labels. This highlights the planner’s ability to generate context-specific motion patterns naturally, without abrupt switching between modes. Throughout these transitions, the robot maintains consistent tracking accuracy and stable locomotion. Unlike typical RL-based controllers that favor a fixed trotting pattern, our planner dynamically adjusts coordination to balance efficiency and stability—achieving robust, adaptive locomotion through smooth gait modulation.