DROC: Distilling and Retrieving Generalizable Knowledge for Robot Manipulation via Language Corrections


Lihan Zha, Yuchen Cui, Li-Heng Lin, Minae Kwon, Montserrat Gonzalez Arenas, 

Andy Zeng, Fei Xia, Dorsa Sadigh

arXiv       pdf       LLM_Prompts       Code

DROC (Distillation and Retrieval of Online Corrections) can respond effectively to online human language corrections, distill generalizable knowledge from corrections, and retrieve usable knowledge for future tasks.

Abstract

Today’s robot policies exhibit subpar performance when faced with the challenge of generalizing to novel environments. Therefore, adapting to and learning from online human corrections is essential but a non-trivial endeavor: not only do robots need to remember human feedback over time to retrieve the right information in new settings and reduce the intervention rate, but also they would need to be able to respond to feedback that can take arbitrary corrections about high-level human preferences to low-level adjustments to skill parameters. In this work, we present Distillation and Retrieval of Online Corrections (DROC), an LLMbased system that can respond to arbitrary forms of language feedback, distill generalizable knowledge from corrections, and retrieve relevant past experiences based on textual and visual similarity for improving performance in novel settings. DROC is able to respond to a sequence of online language corrections that address failures in both high-level task plans and low-level skill primitives. We demonstrate DROC effectively distills the relevant information from the sequence of online corrections in a knowledge base and retrieves that knowledge in settings with new task or object instances. DROC outperforms baseline CaP by using only half of the total number of corrections needed in the first round and requires little to no corrections after two iterations. 

Video

DROC_5min.mp4

Skill-Level Experiments

Open drawer

open_drawer_1.mp4

Train task: Open top white drawer.

Test task: Open top white drawer (different location).

Decrease in Number of Corrections: 5

open_drawer_2.mp4

Train task: Open middle white drawer.

Test task: Open top gray drawer.

Decrease in Number of Corrections: 6

Pick up object

pick_up_obj_1.mp4

Train task: Pick up pen.

Test task: Pick up spoon.

Decrease in Number of Corrections: 1

pick_up_obj_2.mp4

Train task: Pick up medium-sized tape.

Test task: Pick up large tape.

Decrease in Number of Corrections: 3

Put scissors into drawer

put_scissors_in_drawer_1.mp4

Train task: Put black scissors into white drawer.

Test task: Put blue tape into gray drawer.

Decrease in Number of Corrections: 9

put_scissors_in_drawer_2.mp4

Train task: Put red scissors into top white drawer.

Test task: Put black scissors into middle white drawer.

Decrease in Number of Corrections: 10

Put tape into drawer

put_tape_in_drawer_1.mp4

Train task: Put blue tape in middle white drawer.

Test task: Put green tape in top gray drawer.

Decrease in Number of Corrections: 3

put_tape_in_drawer_2.mp4

Train task: Put spoon in top gray drawer.

Test task: Put tape in top white drawer.

Decrease in Number of Corrections: 6

Hang cup on rack

hang_flipped_cup_train.mp4

Train task (a): Hang flipped cup on rack.

Number of Corrections: 10 

hang_upright_cup_train.mp4

Train task (b): Hang upright cup on rack.

Number of Corrections: 24

hang_flipped_cup_test.mp4

Test task: Hang flipped green cup on rack.

Retrieve train task (a)'s knowledge

Decrease in Number of Corrections: 1

Plan-Level Experiments

User Preference

Train task: Put scissors in drawer.

Test task: Clean the table.

Train task: Bring me a cup of coffee.

Test task: Make me a cup of coffee.

Feasibility of Plan

Train task: Heat milk in the fridge.

Test task: Slice the carrot.

Train task: Sort blocks to the drawer.

Test task: Sort the rest blocks to the drawer.

Common-Sense Reasoning

Train task: Set dinner table.

Test task: I want to have lunch.

Train task: Put shoes on rack.

Test task: Sort clothes to the shelf.

Scene Information

Train task: Place book in shelf.

Test task: Place DVD in shelf.

Train task: Bring me a pen.

Test task: Cut the paper into half.

Evaluation Results

We show that our method DROC has three core abilities:

Skill-Level Results

For skill-level tasks, we compare with the following baselines:

Plan-Level Results

For plan-level tasks, we compare with the following baseline:

* We only report the average number of corrections for test tasks here because DROC and Ours–R share the same correction module and perform exactly the same on the train tasks. Thus we only show the ability of DROC's distillation and retrieval module here.



@inproceedings{zha2024distilling,

      title={Distilling and Retrieving Generalizable Knowledge for Robot Manipulation via Language Corrections}, 

      author={Lihan Zha and Yuchen Cui and Li-Heng Lin and Minae Kwon and Montserrat Gonzalez Arenas and Andy Zeng and Fei Xia and Dorsa Sadigh},

      year={2024},

      booktitle={2024 IEEE international conference on robotics and automation (ICRA)},

      organization={IEEE}

}