Deep Equilibrium
Model Predictive Control
Swaminathan Gurumurthy, Khai Nguyen, Arun Bishop, Zachary Manchester, and Zico Kolter
CoRL 2025, Korea
Swaminathan Gurumurthy, Khai Nguyen, Arun Bishop, Zachary Manchester, and Zico Kolter
CoRL 2025, Korea
Incorporating task-specific priors within a policy or network architecture is crucial for enhancing safety and improving representation and generalization in robotic control problems. Differentiable Model Predictive Control (MPC) layers have proven effective for embedding these priors, such as constraints and cost functions, directly within the architecture, enabling end-to-end training. However, current methods often treat the solver and the neural network as separate, independent entities, leading to suboptimal integration. In this work, we propose a novel approach that co-develops the solver and architecture unifying the optimization solver and network inference problems. Specifically, we formulate this as a joint fixed-point problem over the coupled network outputs and necessary conditions of the optimization problem. We solve this problem in an iterative manner where we alternate between network forward passes and optimization iterations. Through extensive ablations in various robotic control tasks, we demonstrate that our approach yields richer representations and more stable training, while naturally accommodating warm starts, a key requirement for MPC.
A real quadrotor is tasked with navigating through a (virtually) cluttered environment filled with numerous obstacles.
Real robot demonstration
Playback with obstacles
We present navigation experiment playbacks (from their respective hardware runs) using four different baselines. Obstacles turned red indicate collisions.
DEQ-MPC-DEQ succeeds
Diff-MPC-DEQ crashes early
DEQ-MPC-NN collides
Diff-MPC-NN collides
We evaluate DEQ-MPC variants and Diff-MPC baselines on a range of challenging tasks in both domains:
(1) simulation, as shown in Table 1 (pendulum, cartpole, quadrotor, quadrotor-pole, and quadrotor-pole with static/dynamic obstacles),
(2) the real world, as shown in Table 2 (Crazyflie quadrotor navigating through static obstacles).
We demonstrate DEQ-MPC’s enhanced representation capabilities. First, DEQ-MPC variants scale more effectively with dataset size and model capacity. Second, they show less performance degradation as constraint complexity increases.
Generalization
Network capacity
Constraint hardness
Gradient niceness
Parameter sensitivity
Warm-starting
Our experimental results highlight several key advantages of DEQ-MPC over differentiable MPC layers. The performance gap between DEQ-MPC variants and Diff-MPC becomes increasingly apparent as task complexity increases, whether through harder constraints, longer planning horizons, or increased problem sensitivity. A particularly promising aspect of DEQ-MPC is its favorable scaling behavior. Unlike Diff-MPC variants which show signs of performance saturation, DEQ-MPC models continue to improve with increasing dataset size and network capacity. This suggests potential for exploiting scaling laws in robotics applications. Furthermore, DEQ-MPC's effectiveness in warm-starting scenarios, requiring fewer augmented Lagrangian iterations while maintaining performance, offers significant practical advantages for real-world deployment. This advantage was also evident in our hardware experiments, where DEQ-MPC methods demonstrated superior reliability. Interestingly, there exist trade-offs even between the DEQ-MPC variants. While DEQ-MPC-NN performs slightly better on average in simulation, DEQ-MPC-DEQ remains stable across a wider range of conditions compared to DEQ-MPC-NN, suggesting a trade-off between performance and stability.