Physics-Based RL

The main research goal during my PhD was to design robots which could achieve the manipulation capabilities of human learners through exploration in natural environments. This presents a great many challenges, ranging from perception and model estimation to planning and control. In those days I focused on two-handed manipulation of large furniture-like objects, and I wanted my robot to be able to handle never-before seen objects. To that end, my work was on robot intuitive physics, a machine learning approach for efficiently modeling arbitrary world dynamics in terms of a space of latent physical quantities. Critically, and unlike previous approaches to manipulation with domestic robots, the underlying model representation adapted online to the data observed by the robot.

The high-level idea behind the approach is to place prior distributions over the modeling API of a physics engine (e.g. friction coefficients, masses, restitution, etc.), and update those distributions online as data comes in from the robot's cameras and force sensors. Rather than elaborating here I'll just show an animation that hopefully conveys the intuition, but see the paper for a fuller discussion.

MCMC visualization for fitting the position parameters of a wheel constraint on a shopping cart. Real object motion is visualized in red, and the simulated hypothesis is in blue. The error bar represents a norm on the non-overlap between red and blue. The real location is mid-way along the handlebar, and captures the lateral friction effect of both fixed-wheels.

This method can be viewed as an online bayesian approach to system identification using a full simulator as the target function. That's a pretty strong prior-knowledge assumption, and since moving to DeepMind I've been working on ways to encode this physics knowledge in a more abstract and general way, without compromising the sample efficiency of the stochastic physics-engine approach.

Some figures from the ICML, ICRA, and IROS papers are shown below. The take-home is that when this bias is appropriate (i.e. when the object's constraints are supported by the model), the agent can learn the dynamics with dramatically fewer observations than are possible with more general regression methods. Unsurprisingly, this pays off in online performance, since the sooner you learn a decent model, the sooner you can start making good choices.

After my final IROS presentation this work was picked up by IEEE Spectrum and Discovery, which was a happy ending to my long journey with Krang (the robot).

Full NAMO task with two tables:

Learning multi-body dynamics through contact: