Optimal Cost Design for Model Predictive Control

Avik Jain, Lawrence Chan, Daniel S. Brown, Anca D. Dragan

University of California, Berkeley

In the Proceedings of the Third Annual Learning for Dynamics and Control Conference (L4DC) 2021.

Introduction/Motivation

One of the most commonly used planning paradigms in robotics is nonconvex Model Predictive Control (MPC). MPC optimizes trajectories based on a ground-truth cost function, C, that describes the task.

MPC uses four heuristics to enable tractable planning:

  1. Short Horizon: MPC plans for a finite horizon which is typically much shorter than the full horizon of the problem, which can lead to myopic behavior.

  2. Local Optimization: MPC performs local trajectory optimization, which leads to behavior that's only locally optimal.

  3. Replanning: MPC repeated replans a new trajectory following each observation, and only takes the first action from this plan. This can lead to behavior that fails to account for future replanning.

  4. Approximate Dynamics Model: MPC plans into the future using an approximate model of dynamics, which typically sacrifices accuracy for computational tractability. This can lead to compounding errors in the executed trajectory.

Performance of MPC planners when using the true cost C (gray) versus an optimized surrogate cost C' (orange), as measured by the cumulative true cost C.

We demonstrate that we can address the above issues via optimal cost design for model predictive control (OCD-MPC): given a ground-truth cost C and a planning algorithm, we find a surrogate cost C' such that optimizing for C' results in robot trajectories with minimal cost under the true cost function C. We propose using a zeroth order optimization technique (CMA-ES) to solve the OCD-MPC problem.

We demonstrate that finding an appropriate surrogate cost C' is tractable and analyze several scenarios where OCD-MPC alleviates suboptimality due to each of the four heuristics.

Scenario 1: Suboptimality Due to Short Planning Horizon

First, we provide an example of how OCD-MPC can alleviate problems resulting from MPC using a short planning horizon. In this scenario, the robot car (orange) starts behind a fixed speed human car (black).

Because the human car ahead of the robot is traveling slower than the robot's target speed, the robot must either switch lanes to maintain speed or slow down and stay in the center lane -- the former behavior is preferred and has lower true cost.

However, optimizing the true cost with MPC yields suboptimal behavior. The finite horizon planner used by MPC only plans 5 timesteps into the future, during which the car incurs costs for being between on or close to the lane line, but does not plan far enough into the future to realize the benefit of increased speed afforded by a lane change.

By contrast, OCD-MPC finds a cost function C' which causes lane switching even with the default short-horizon MPC planner. This yields lower true cumulative cost under C, mitigating the effects of short-horizon planning.

MPC Trajectory with true cost C

MPC Trajectory with surrogate cost C'

Scenario 2: Suboptimality Due to Local Optimization

MPC Trajectory with true cost C

MPC Trajectory with optimized surrogate cost C'

In our second scenario, we consider how OCD-MPC can alleviate suboptimality due to local optimization. Here, the robot car (orange) is initialized to the left of the human car, and its task is to maneuver to the right lane. MPC with a single trajectory initialization using the true cost C converges to a local optima, and fails to slow down and merge to the right lane. However, OCD-MPC is able to find a surrogate cost function C' that causes successful merging behavior, without changing the number of initializations.

Scenario 3: Suboptimality Due to Replanning

In our third scenario, we consider suboptimal behavior that results from the fact that MPC plans trajectories without accounting for future replanning. In this scenario, the robot is between two lanes, approaching a slow human car which is also between lanes.

The robot knows the human will merge into one of the two lanes in the future, thinks both cases are equally likely, and plans based on expected cost under its belief distribution. Because the robot car incurs cost for traveling slowly behind the human car, the optimal behavior is to merge into whichever lane becomes free, once the human chooses a lane. However, MPC sees a human ahead of it in the worst case regardless of which lane it chooses, causing the robot to slow down to avoid a collision and to arbitrarily pick a lane to merge into to avoid driving on the lane line. MPC does not perform contingency planning and cannot foresee that it can maintain a higher speed and will be in a better position to choose a lane in future timesteps.

On the other hand, the learned surrogate cost function results in emergent contingency planning: the robot maintains its starting velocity and then picks the free lane once the human reveals their lane preference.

MPC Trajectory with true cost C

MPC Trajectory with optimized surrogate cost C'


Suboptimality Due to Approximate Dynamics Model

Finally, we create a dynamics mismatch by adding anisotropic noise to the true dynamics of the simulator. We simulate wind blowing across the highway by adding a small Gaussian distributed force in each of the three scenarios and then reran OCD-MPC on these three modified scenarios. The robot uses the deterministic dynamics model with no wind when performing MPC, but actually moves according to the transition function where the car is affected by wind. We find that not only can OCD-MPC learn a cost function that improves performance in each scenario, in all three scenarios, OCD-MPC is able to learn a cost function that performs similarly to the case without dynamics mismatch.

Performance of MPC planners under dynamics mismatch when using the true cost C (gray) versus an optimized surrogate cost C' (orange), as measured by the cumulative true cost C.