PLATO: Predicting Latent Affordances Through Object-Centric Play

Code: https://github.com/Stanford-ILIAD/plato_sandbox

PDF: https://arxiv.org/pdf/2203.05630.pdf

Video (see below)

Constructing a diverse repertoire of manipulation skills in a scalable fashion remains an unsolved challenge in robotics. One way to address this challenge is with unstructured human play, where humans operate freely in an environment to reach unspecified goals. Play is a simple and cheap method for collecting diverse user demonstrations with broad state and goal coverage over an environment. Due to this diverse coverage, existing approaches for learning from play are more robust to online policy deviations from the offline data distribution. However, these methods often struggle to learn under scene variation and on challenging manipulation primitives, due in part to improperly associating complex behaviors to the scene changes they induce. Our insight is that an object-centric view of play data can help link human behaviors and the resulting changes in the environment, and thus improve multi-task policy learning. In this work, we construct a latent space to model object affordances -- properties of an object that define its uses -- in the environment, and then learn a policy to achieve the desired affordances. By modeling and predicting the desired affordance across variable horizon tasks, our method, Predicting Latent Affordances Through Object-Centric Play (PLATO), outperforms existing methods on complex manipulation tasks in both 2D and 3D object manipulation simulation and real world environments for diverse types of interactions.

Video

Approach

Our method, depicted in reference to the three phases of the interaction, starting with the pre-interaction period (purple, e.g., reaching), followed by the object interaction (blue, e.g., pulling), followed by post-interaction (green, e.g., detaching and reaching again). The red block is the ego agent, and the blue block is being manipulated. (1) The affordance posterior takes in a sampled window $\tau^{(i)}$ from the interaction period, and learns to encode this into an object affordance distribution ($z$). (2) The learned prior takes in the starting state of the object $o^({i)}_1$, and a goal state of the object $o^g$ sampled from the post-interaction period, and learns an affordance distribution ($z'$) to match the posterior. (3) The robot policy (blue) is trained to reconstruct the interaction actions $a^{(i)}_{1:H^{(i)}}$ conditioned on the affordance $z$. (4) The policy $\pi$ (purple) is also trained to reconstruct the pre-interaction actions, conditioned on the \textit{future} affordance $z$. At test time, the latent affordance $z'$ produced by the prior E' will be used to decode robot actions through the policy $\pi$. In this way, both the policy and the prior networks will be within distribution at test time for variable or longer horizon object interactions, and the latent space will be structured to be informative about both the interaction period actions and the pre-interaction period actions. Note that the post-interaction period is just the next pre-interaction period, so these states will not be left out of the learning process.

Block2D Environment

Push

Pull

Lift

Tip

Side-Rotate

3D Environments

block3dflat_play.mov

Example Robot Play in Block3D-Flat

block3dplatform_play.mov

Example Robot Play in Block3D-Platform

mug_v3_better_view.mov

Example Robot Play in Mug3D-Platforms

dcab_all_better_view.mov

Example Robot Play in Playroom3D

3D Environments: Task Visualizations

First Row: Block2D Environment Primitive Examples. Each of these primitives can be executed from a variety of object initial conditions, masses, and dimensions.

Second Row: Block3D and Block3D-Platform Primitive Examples. Again, object initial conditions, masses, and dimensions are varied during play. The left two primitives shown are taken from Block3D-Flat, and the right two are taken from Block3D-Platforms. These represent a subset of the evaluation primitives in the 3D environment, and are meant to show the diversity of tasks and behaviors our method is evaluated on.

Third Row: The left image shows an example primitive in Mug3D-Platforms. The right three images show sample tasks from Playroom3D.

3D Environment Results: Example PLATO Rollouts

evalnoretreatstop_imgs40.mp4

Block2D

c4mil_evalnoretreatstop_text_imgs20.mp4

Block3D-Flat

c1mil_evalnoretreatstopmorelift_text_imgs40.mp4

Block3D-Platforms

c4mil_ss_evalnoretreat_imgs20.mp4

Playroom3D

c6mil_ss_evalnoretreat_imgs20.mp4

Mug3D-Platforms

From left to right, evaluations of our method on the Block2D, Block3D-Flat, and Block3D-Platforms environments. Our method is able to perform a wide range of manipulation primitives in 2D and 3D environments by learning from object-centric interactions, as shown in the videos above. See our paper for quantitative results and ablation experiments.

Real World Environment: Pushing Tasks

Block-Real environment. We use the Franka Emika Panda 7DOF robot arm for our experiments. The green cube we use for block manipulation tasks is shown in the middle, with ArUco tags on each face for pose detection. The 6D pose of the object is estimated via a multi-camera setup in order to be robust to occlusion by the robot, as shown here with the camera on the far right. The ArUco tag in the middle is used to calibrate the extrinsics of the cameras and localize the object frame of reference relative to the robot.

Combined_LMP_real.mov

Combined_PLATO_real.mov

Block-Real, methods trained in sim, but evaluated in real. In order, 10 each of (Push-Back, Push-Forward, Push-Left, Push-Right)

Left: Play-LMP. Right: PLATO

PLATO and LMP generalize well to novel object dynamics, demonstrating the state-action coverage benefits of play. PLATO generalizes the best to this unseen environment.

Interaction Quality: Examples of Intermittent Contact

PLATO does not assume access to a perfectly segmented play dataset. By conditioning the latent space on the initial and final object states, PLATO is able to be robust to occasional accidental contacts with the environment. Below are some examples of mistaken contact in our dataset that will get classified as an interaction during training:

failure_dcab1.mp4

failure_dcab2.mp4

failure_dcab3.mp4