Task and Motion Planning with Large Language Models for Object Rearrangement

Yan Ding* 1, Xiaohan Zhang* 1, Chris Paxton 2, Shiqi Zhang 1 (* equal contribution)

1 SUNY Binghamton; 2 Meta AI

Accepted by IROS 2023

Abstract

Multi-object rearrangement is a crucial skill for service robots, and commonsense reasoning is frequently needed in this process. However, achieving commonsense arrangements requires knowledge about objects, which is hard to transfer to robots. Large language models (LLMs) are one potential source of this knowledge, but they do not naively capture information about plausible physical arrangements of the world. 

We propose LLM-GROP, which uses prompting to extract commonsense knowledge about semantically valid object configurations from an LLM and instantiates them with a task and motion planner in order to generalize to varying scene geometry. LLM-GROP allows us to go from natural-language commands to human-aligned object rearrangement in varied environments. 

Framework of LLM-GROP

LLM-GROP is comprised of two key components: the LLM and the Task and Motion Planner

LLM for Computing Semantically Meaningful Object Configurations

Symbolic Spatial Relationship

Prompt 1: 

The goal is to set a dining table with objects. The symbolic spatial relationship between objects includes [spatial relationships]. [examples]. What is a typical way of positioning [objects] on a table? [notes]

LLM's Response (Example):

Place plate in the center of table.

Place knife in the right of the plate.

Place fork in the left of the plate.

...

Geometric Spatial Relationship

Prompt 2: 

[object A] is placed [spatial relationship] [object B]. How many centimeters [spatial relationship] [object B] should [object A] be placed?

LLM's Response (Example):

The distance between the knife and the plate is 7cm. 

The distance between fork and the plate is 5cm. 

...

LLM-Recommened Object Configuration

Task and Motion Planner for Realizing Object Rearrangements

Task Level: The robot needs decide the sequence of object placement and how to approach the table.  For example, if a bread is on top of a plate, the robot must first place the plate and then the bread. 

Motion Level: The robot also needs determine how to approach the table, such as from which side of the table. Once the task plan is determined, the robot must compute 2D navigation goals at the motion level that connect the task and motion levels.

Four Demonstrations in Simulation

One Demonstration in Real World

We demonstrate LLM-GROP on real robot hardware. The real-robot system includes a Segway-based mobile platform and a UR5e robot arm. The robot employs hard-coded procedures for object grasping. The task is to serve a human with a knife, a fork, a cup, a plate, and a strawberry. The robot computes a plan that successfully avoids chairs and the human around the table, while being able to place the target objects in plausible physical positions.

BibTeX

@article{ding2023task,

  title={Task and motion planning with large language models for object rearrangement},

  author={Ding, Yan and Zhang, Xiaohan and Paxton, Chris and Zhang, Shiqi},

  journal={arXiv preprint arXiv:2303.06247},

  year={2023}

}