DROC: Distilling and Retrieving Generalizable Knowledge for Robot Manipulation via Language Corrections
Lihan Zha, Yuchen Cui, Li-Heng Lin, Minae Kwon, Montserrat Gonzalez Arenas,
Andy Zeng, Fei Xia, Dorsa Sadigh
DROC (Distillation and Retrieval of Online Corrections) can respond effectively to online human language corrections, distill generalizable knowledge from corrections, and retrieve usable knowledge for future tasks.
Abstract
Today’s robot policies exhibit subpar performance when faced with the challenge of generalizing to novel environments. Therefore, adapting to and learning from online human corrections is essential but a non-trivial endeavor: not only do robots need to remember human feedback over time to retrieve the right information in new settings and reduce the intervention rate, but also they would need to be able to respond to feedback that can take arbitrary corrections about high-level human preferences to low-level adjustments to skill parameters. In this work, we present Distillation and Retrieval of Online Corrections (DROC), an LLMbased system that can respond to arbitrary forms of language feedback, distill generalizable knowledge from corrections, and retrieve relevant past experiences based on textual and visual similarity for improving performance in novel settings. DROC is able to respond to a sequence of online language corrections that address failures in both high-level task plans and low-level skill primitives. We demonstrate DROC effectively distills the relevant information from the sequence of online corrections in a knowledge base and retrieves that knowledge in settings with new task or object instances. DROC outperforms baseline CaP by using only half of the total number of corrections needed in the first round and requires little to no corrections after two iterations.
Video
Skill-Level Experiments
Open drawer
Train task: Open top white drawer.
Test task: Open top white drawer (different location).
Decrease in Number of Corrections: 5
Train task: Open middle white drawer.
Test task: Open top gray drawer.
Decrease in Number of Corrections: 6
Pick up object
Train task: Pick up pen.
Test task: Pick up spoon.
Decrease in Number of Corrections: 1
Train task: Pick up medium-sized tape.
Test task: Pick up large tape.
Decrease in Number of Corrections: 3
Put scissors into drawer
Train task: Put black scissors into white drawer.
Test task: Put blue tape into gray drawer.
Decrease in Number of Corrections: 9
Train task: Put red scissors into top white drawer.
Test task: Put black scissors into middle white drawer.
Decrease in Number of Corrections: 10
Put tape into drawer
Train task: Put blue tape in middle white drawer.
Test task: Put green tape in top gray drawer.
Decrease in Number of Corrections: 3
Train task: Put spoon in top gray drawer.
Test task: Put tape in top white drawer.
Decrease in Number of Corrections: 6
Hang cup on rack
Train task (a): Hang flipped cup on rack.
Number of Corrections: 10
Train task (b): Hang upright cup on rack.
Number of Corrections: 24
Test task: Hang flipped green cup on rack.
Retrieve train task (a)'s knowledge
Decrease in Number of Corrections: 1
Plan-Level Experiments
User Preference
Train task: Put scissors in drawer.
Test task: Clean the table.
Train task: Bring me a cup of coffee.
Test task: Make me a cup of coffee.
Feasibility of Plan
Train task: Heat milk in the fridge.
Test task: Slice the carrot.
Train task: Sort blocks to the drawer.
Test task: Sort the rest blocks to the drawer.
Common-Sense Reasoning
Train task: Set dinner table.
Test task: I want to have lunch.
Train task: Put shoes on rack.
Test task: Sort clothes to the shelf.
Scene Information
Train task: Place book in shelf.
Test task: Place DVD in shelf.
Train task: Bring me a pen.
Test task: Cut the paper into half.
Evaluation Results
We show that our method DROC has three core abilities:
Accurately responds to online corrections
Distills generalizable knowledge
Retrieves relevant knowledge in novel tasks to improve performance
Skill-Level Results
For skill-level tasks, we compare with the following baselines:
CaP: Code as Policies
Ours-H: DROC with no initial history to show that knowledge distillation from prior tasks are important
Ours-E: DROC without relevant context extractor (uses all interaction history for correction handling)
Ours-V: DROC that does not leverage visual retrieval
Plan-Level Results
For plan-level tasks, we compare with the following baseline:
Ours-R: DROC that does not distill knowledge and naively retrieve saved plans
* We only report the average number of corrections for test tasks here because DROC and Ours–R share the same correction module and perform exactly the same on the train tasks. Thus we only show the ability of DROC's distillation and retrieval module here.
@inproceedings{zha2024distilling,
title={Distilling and Retrieving Generalizable Knowledge for Robot Manipulation via Language Corrections},
author={Lihan Zha and Yuchen Cui and Li-Heng Lin and Minae Kwon and Montserrat Gonzalez Arenas and Andy Zeng and Fei Xia and Dorsa Sadigh},
year={2024},
booktitle={2024 IEEE international conference on robotics and automation (ICRA)},
organization={IEEE}
}