StructFormer: Learning Spatial Structure for Language-Guided Semantic Rearrangement of Novel Objects

Abstract

Geometric organization of objects into semantically meaningful arrangements pervades the built world. As such, assistive robots operating in warehouses, offices, and homes would greatly benefit from the ability to recognize and rearrange objects into these semantically meaningful structures. To be useful, these robots must contend with previously unseen objects and receive instructions without significant programming. While previous works have examined recognizing pairwise semantic relations and sequential manipulation to change these simple relations none have shown the ability to arrange objects into complex structures such as circles or table settings. To address this problem we propose a novel transformer-based neural network, StructFormer, which takes as input a partial-view point cloud of the current object arrangement and a structured language command encoding the desired object configuration. StructFormer enables a physical robot to rearrange novel objects into semantically meaningful structures with multi-object relational constraints inferred from the language command.

Overview

Examples of Language-Conditioned Rearrangement Data

Rearrange objects that have the same class as the plastic, cyan object into a medium circle in the middle center of the table facing east.

Rearrange objects that are blue into a medium circle in the middle left of the table facing north.

Rearrange objects that are taller than the yellow, plastic object into a medium circle in the middle left of the table facing east.

Rearrange objects that are taller than the red spoon into a small circle in the middle left of the table facing east.

Rearrange objects that are green into a medium circle in the middle center of the table facing north.

Rearrange objects that are green into a medium circle in the middle center of the table facing east.

Rearrange objects that have the same color as the metal object into a small line in the middle center of the table.

Rearrange objects that have the same class as the magenta object into small line in the middle center of the table.

Rearrange objects that have the same size as the metal, cyan donut into medium line in the middle center of the table.

Rearrange objects that have the same height as the green, plastic hammer into a medium line in the bottom center of the table.

Rearrange objects that have the same height as the glass stapler into a small line in the middle center of the table.

Rearrange objects that have the same height as the blue, glass power into a medium line in the middle center of the table.

Rearrange objects that have the same class as the magenta, plastic object into a tower in the middle left of the table facing west.

Rearrange objects that are green into a tower in the top center of the table facing east.

Rearrange objects that have the same class as the yellow object into a tower in the middle center of the table facing east.

Rearrange objects that have same size as the blue object into a tower in the middle center of the table facing west.

Rearrange objects that are larger than the plastic object into a tower in the middle center of the table facing south.

Rearrange objects that are plastic into a tower in the top left of the table facing north.

Set the table.

Set the table.

Set the table.

Set the table.

Set the table.

Set the table.

Additional Results

Section VI.A: Comparison with baselines on pose generation. The table is an expanded version of Table 2 in the paper, showing standard deviation in addition to mean error.

Section VI.B: To evaluate our method and baselines in simulation, we use 138 novel objects from 23 known object classes. Left figure shows object models for training, right figure shows object models for evaluation.

Robot Demos

Rearrange mugs into a circle

(Pose Generator)

Rearrange boxes into a line.

(Pose Generator)

Rearrange mugs into a line.

(Pose Generator)

Rearrange mugs into a circle.

(Object Selection Network + Pose Generator)