Deep SE(3)-Equivariant Geometric Reasoning for Precise Placement Tasks

Ben Eisner, Yi Yang, Todor Davchev, Mel Vecerik, Jon Scholz, David Held

arXiv: https://arxiv.org/abs/2404.13478

OpenReview: https://openreview.net/forum?id=2inBuwTyL2

Video: https://recorder-v3.slideslive.com/?share=92250&s=b9f932be-9352-4ef2-bcd2-56eea1e15aef

Code: https://github.com/r-pad/taxpose

Poster: https://docs.google.com/presentation/d/13gZIpr-f-6g7pXEkYx7QVyJXnM4fYneCMSDTgec-Qpg/edit?usp=drive_link

Deep SE(3)-Equivariant Geometric Reasoning for

Precise Placement Tasks

Many robot manipulation tasks can be framed as geometric reasoning tasks, where an agent must be able to precisely manipulate an object into a position that satisfies the task from a set of initial conditions. Often, task success is defined based on the relationship between two objects - for instance, hanging a mug on a rack. In such cases, the solution should be equivariant to the initial position of the objects as well as the agent, and invariant to the pose of the camera. This poses a challenge for learning systems which attempt to solve this task by learning directly from high-dimensional demonstrations: the agent must learn to be both equivariant as well as precise, which can be challenging without any inductive biases about the problem. In this work, we propose a method for precise relative pose prediction which is provably SE(3)-equivariant, can be learned from only a few demonstrations, and can generalize across variations in a class of objects. We accomplish this by factoring the problem into learning an SE(3) invariant task-specific representation of the scene and then interpreting this representation with novel geometric reasoning layers which are provably SE(3) equivariant. We demonstrate that our method can yield substantially more precise placement predictions in a simulated placement task than previous methods trained with the same amount of data.

Example of Precision Mug Placement

Below, we show predictions on 100 different mug placement examples. We compare two different methods: Ours, and TAX-Pose.

Ours

TAX-Pose (retrained)

GIF scrubbing back-and-forth between the two

Notice the more consistent alignment in Ours compared to TAX-Pose.

Example using our method in a simulated robot setting to grasp and place a mug

Intermediate motion is accomplished using motion planning.

output.mp4

Appendix

CoRL_2023___Geometric_Reasoning_appendix.pdf