Latent Space Planning for Unobserved Objects with Environment-Aware Relational Classifiers

Abstract

For all but the simplest of tasks, robots must understand how the objects they manipulate interact with structural elements of the environment. This becomes more challenging when some objects are occluded by environments and are not observable. In this work, we examine the problem of predicting inter-object and object-environment relations between unobserved objects and novel environments from partial-view point clouds. Our approach enables robots to plan and execute manipulation sequences to complete tasks defined purely from logical relations. The key to our method is a novel transformer-based neural network that both predicts object-environment relations and learns a latent-space dynamics function. We achieve reliable sim-to-real transfer without any finetuning.

Robot Experiments

The robot can reason about both pick and place and pushing actions in an environment with multiple shelves. (Figure 1)

Goal: Contact(all red boxes, low shelf) = 1

Goal: Contact(white, coffee can) = 1

We show one example of how our model can achieve desired goal relations including unobserved objects. (Figure 2)

History

Goal: Left(white cleaner, yellow mustard) = 1

Given the same goal relation Contact(white, shelf) = 1 with the same environment, but different initial object pose, standing versus lying down, our framework can choose between different actions, picking versus pushing, to achieve the goal relations. Furthermore, for the same scene the robot understands how to manipulate an object to be above, under, or in contact with a shelf. The robot can also choose to use pick-and-place to achieve a desired object-environment contact relation when the shelf is high and chooses to push when the shelf is low. 

Goal: Contact(white, shelf) = 1

Goal: Contact(white, shelf) = 1

Goal: Below(white, shelf) = 1

Goal: Contact(white, shelf) = 1

Given the same initial scene the robot is tasked with moving all objects either to the boundary or off of the supporting table. The robot succeeds for different tables of varying shape, size, and height. These results highlight the model's ability to ground the object-environment semantic concepts to the geometry of the observed scene. 

Goal: Boundary(all objects, table) = 1

Goal: Above(all objects, table) = 0

Goal: Boundary(all objects, table) = 1

Goal: Above(all objects, table) = 0