ABC-LMPC: Safe Sample-Based Learning MPC for Stochastic Nonlinear Dynamical Systems with Adjustable Boundary Conditions
Brijen Thananjeyan*, Ashwin Balakrishna*, Ugo Rosolia, Joseph E. Gonzalez, Aaron Ames, Ken Goldberg
Brijen Thananjeyan*, Ashwin Balakrishna*, Ugo Rosolia, Joseph E. Gonzalez, Aaron Ames, Ken Goldberg
Paper | Code (Coming Soon)
Sample-based learning model predictive control (LMPC) strategies have recently attracted attention due to their desirable theoretical properties and their good empirical performance on robotic tasks. However, prior analysis of LMPC controllers for stochastic systems has mainly focused on linear systems in the iterative learning control setting. We present a novel LMPC algorithm, Adjustable Boundary Condition LMPC (ABC-LMPC), which enables rapid adaptation to novel start and goal configurations and theoretically show that the resulting controller guarantees iterative improvement in expectation for stochastic nonlinear systems. We present results with a practical instantiation of this algorithm and experimentally demonstrate that the resulting controller adapts to a variety of initial and terminal conditions on 3 stochastic continuous control tasks.
We present experiments evaluating whether ABC-LMPC can (1) safely optimize policies which completing the task during learning, (2) seamlessly transfer to new goal sets, and (3) learn to start from novel start configurations. We show results for a 7 link arm reacher domain here; please see the paper for results on additional simulation domains. For all experiments a trajectory cost less than 50 implies convergence to the goal from the current start state.
The objective is to guide the end effector to goal 0, and ABC-LMPC is provided with 100 hand-tuned demonstrations initially. The results indicate 0 constraint violations, consistent convergence to the goal, and iterative improvement, with ABC-LMPC converging to near optimal performance in all 3 runs (converges to performance of SAVED, which is optimized for fixed start/goal setting).
Here the goal is still to guide the end effector to goal 0, but from a new start state for the end effector at (-1, 0) on the map above. We find that ABC-LMPC is able to iteratively shift its start state towards the desired start state while never violating constraints and converging to goal 0 on every iteration (trajectory cost < 50).
Here the goal is to adapt from guiding the end effector to goal 0 to goal 1. We show the mean trajectory cost for reaching goal 0 in blue and for reaching goal 1 in red after switching to these goals as illustrated in the learning curves on the right. We observe that the controllers seamlessly transfer to the new goal sets, which are illustrated in the plot on the left. We find that even when the goal set is switched, the learned controllers do not violate constraints and consistently reach the new goals.