We released the code of our algorithm at https://github.com/josef-w/Differentiable-iLQR
mpc_explicit and lqr_step_explicit are main functions.
New rocket landing demo with our DiLQR.
Frame-by-Frame Visual Comparison
DiLQR vs DiffMPC: A Comprehensive Comparison
We provide detailed experiments comparing DiLQR and DiffMPC on both cost and dynamics (dx) learning tasks in the Cartpole environment, under settings with 50 and 100 training trajectories.
1. Aggregated Cost Learning Curve (Cartpole, Train=100)
Observation:
DiLQR converges significantly faster than DiffMPC.
2.Per-Dimension Cost Learning Trajectories
Although DiLQR and DiffMPC exhibit similar trends in Dim 1 and Dim 2, in Dim 0 the difference is significant:
DiLQR (blue) steadily converges toward the ground truth (black dashed), while DiffMPC (red) consistently diverges in the wrong direction, highlighting a failure to capture the correct gradient signal.
3. Aggregated Dynamics Learning Curve (Cartpole, Train=50 & Train=100)
Observation:
The aggregated dx learning curves show that DiffMPC's error plateaus early, especially in the train=50 setting, indicating premature convergence or stagnation.
In contrast, DiLQR continues to make steady progress, achieving lower final error in both train sizes.
Notably, under the train=50 condition, DiLQR achieves the best overall performance, reducing error by approximately 41% compared to DiffMPC.
While DiLQR’s learning speed is slightly slower in the train=100 setting, it compensates by producing physically meaningful parameters:
The bad-value rate (percentage of learned parameters that are negative) is 0% for DiLQR, compared to 16.7% for DiffMPC.
Negative values in the learned Jacobian are physically implausible and may cause instability in downstream control.
These results suggest that DiLQR not only learns more accurately, but also more realistically, especially in safety-critical applications where physical validity matters.
Bad-Value Ratio (Negative Values in Learned Dynamics)