Learn from the Past: Language-conditioned Object Rearrangement
with Large Language Models
Learn from the Past: Language-conditioned Object Rearrangement
with Large Language Models
Object manipulation for rearrangement into a specific goal state is a significant task for collaborative robots. Accurately determining object placement is a key challenge, as misalignment can increase task complexity and the risk of collisions, affecting the efficiency of the rearrangement process. Most current methods heavily rely on pre-collected datasets to train the model for predicting the goal position. As a result, these methods are restricted to specific instructions, which limits their broader applicability and generalisation. In this paper, we propose a framework of flexible language-conditioned object rearrangement based on the Large Language Model (LLM). Our approach mimics human reasoning by making use of successful past experiences as a reference to infer the best strategies to achieve a current desired goal position. Based on LLM's strong natural language comprehension and inference ability, our method generalises to handle various everyday objects and free-form language instructions in a zero-shot manner. Experimental results demonstrate that our methods can effectively execute the robotic rearrangement tasks, even those involving long sequences of orders.
Framework
Illustration of the proposed framework. The robot uses SAM for visual perception and CLIP for semantic understanding to identify where and what objects are in the environment. The LLM then associate the most similar past experience with instruction and uses this similar experience as a reference. Finally, a prompt is created with spatial and semantic information, allowing the LLM to predict the goal position for rearrangement.
Outer Knowledge of Successful Rearrangement
Results
The first row shows results from random object placement. The second displays objects arranged in a horizontal line with a certain gap. The third shows results from the application of Dream2Real while the fourth row is our proposed method without a reference or template (having a similar structure with TidyBot). The fifth row presents results using our complete proposed method with reference.
Video demos
Put the eggplant on the plate
Put one potato on the left of the plate and another one on the right
Put the eggplant on the right of the potato, then on the left of the pineapple