EMPPI

Model-based Generalization under Parameter Uncertainty using Path Integral Control

Ian Abraham, Ankur Handa, Nathan Ratliff, Kendall Lowrey, Todd D. Murphey, and Dieter Fox

Abstract: This work addresses the problem of robot interaction in complex environments where online control and adaptation is necessary. By expanding the sample space in the free energy formulation of path integral control, we are able to derive a natural extension to the path integral control that embeds uncertainty into action and provides robustness for model-based robot planning. Our algorithm is applied to a diverse set of tasks using different robots and validate our results in simulation and real-world experiments. We further show that our method is capable of running in real-time without loss of performance.

Our method is capable of controlling a variety of robotic systems to do complex tasks while there exists uncertainty in the parameters of the physics model. Shown is Spot Micro hand stand task with mass uncertainty.

We test the algorithm in the real-world under various forms of uncertainty. Each experiment is run in real-time where uncertainty exists in the object mass, sliding friction and joint articulation of the cabinet (see left video for illustration). Objects in the environment are tracked using camera tracking and rendered in the physics simulation which handles collisions and rigid body motion. From left to right, the task is to rotate the object 90 degrees, move the block towards the bottom of the tag, and last open the drawer of the cabinet. Our algorithm successfully accomplishes each of the tasks in a single execution through synthesis of actions that are derived from the uncertainty in the physical parameters of the model.

Parameter Uncertainty

We test our method with inhand manipulation using the simulated Shadow Dexterous hand. The control signal is generated with uncertainty in the finger actuator gains. Video shows successful manipulation of the dice given the target dice (shown above).

Our approach allows us to synthesize a control signal that can generalize to parameter uncertainty in the model. This example illustrates our method using a half-cheetah model attempting a backflip when the body masses and joint damping parameters are uncertain.

Articulation Uncertainty

Our approach is further tested on environments where uncertainty lies in the articulation of objects e.g., drawers and doors. We run simulated tests of a robot opening a door using the Adroit hand. Each arrow indicates a simulator candidate parameter which renders a world where the door is articulated subject to the arrow's position and axis direction. Force feedback in the hand's joints is used to updated the particle likelihood.

Here, we show that our method is able to quickly update the joint articulation without having to reset the simulated example. The control signal that is initially generated satisfies each simulator particle, resulting in conservative behavior until the the consensus of parameters agrees with one another.

Ablation Study

Here, we perform an ablation study on the cart pole swing up task where there is uncertainty in the mass and inertia of the pole to see the effects of the trajectory samples and the parameter samples on the computation time and the performance of EMPPI. We see that the number of parameter samples only has an effect on the overall performance of EMPPI when the trajectory samples are low. This is due to the self-adaptive behavior of EMPPI which requires a significant number of trajectory samples over parameter samples as the parameter samples are updated over time. The effect on computation time is standard as the number of samples increases, so does the computation time (experiment was done using a CPU). Using a GPU or optimizing the parallel processing would improve the computation time significantly. However, not many samples are required to obtain real-time performance while succeeding at the swing up task (<-1.0 is a successful swingup).

Comparison against learned models

Comparison between EMPPI and MPPI with a learned model (over 10 trials) (lower is better). Both examples do not model the pole length. EMPPI utilizes a cart pole model with pole length 0.6 m (true pole length is 0.8 m). While MPPI will eventually obtain the same performance cost as EMPPI, MPPI will take significantly longer to build the implicit structure that EMPPI provides. This result is primarily due to model-based controllers being robust to incorrect/unmodeled effects, allowing consistent performance. A neural network with 200 x 200 units is used and trained to predict the next state given the current state and control input.