LILAC: Online Language Corrections for Robotic Manipulation via Shared Autonomy

Abstract

Modern systems for language-guided human-robot interaction require two key components for broad adoption: adaptivity and learning efficiency. Unfortunately, existing data-driven approaches for learning instruction-following agents cannot adapt, failing to incorporate additional natural language supervision, and even if they could, require hundreds of demonstrations to learn even simple policies.

In this work, we address these problems by presenting a framework for incorporating and adapting to natural language corrections -- "to the right'', or "no, towards the redbook'' -- as the robot executes. To focus on rich manipulation domains where the sample efficiency of existing methods is prohibitive, we work within a shared autonomy paradigm: instead of discrete turn-taking between a human and robot, our shared autonomy paradigm splits agency between the human and robot. In our approach, natural language is an input to a learned model that produces a meaningful, low-dimensional control space that the human can use to guide the robot. Each real-time correction refines the human's control space, enabling the execution of precise, extended behaviors -- with the added benefit of requiring only a handful of demonstrations to learn.

We evaluate our approach via a user study, where users work with a Franka Emika Panda manipulator to complete complex manipulation tasks. Compared to existing learned baselines covering both open-loop instruction following and single-turn shared autonomy, we show that our corrections-aware approach obtains higher task competion rates, and is subjectively preferred by users because of its reliability, precision, and ease of use.

A user interacts with our proposed system. Upon starting, the user provides a high-level instruction "pick up the book and insert it into the bookshelf,'' which induces a low-dimensional control space [Left] for controlling the robot (depicted with the joystick, and shaded inputs). This control space is state and language-conditioned, resulting in meaningful axes: pressing down on the joystick brings the end-effector close to the book, while holding up and left after grasping the book moves the end-effector towards the shelf [Middle]. However, these coarse controls are not enough to perform the task, and the user gets stuck. The core of our approach is the ability to provide corrections [Right] such as "tilt down a little bit,'' refining the control space so that pressing left reorients the end-effector, allowing the user to complete the task.

System Overview

Tasks (sample demonstration trajectories)

clean-trash-view1.mp4

clean-trash

transfer-pen-view1.mp4

transfer-pen

open-drawer-view1.mp4

open-drawer

insert-book-view1.mp4

insert-book

water-plant-view2.mp4

water-plant

Descriptions of Control Method

* Underlying method was not revealed to participants.

Control Method A (Language-conditioned Imitation Learning Model)

Broad Summary: The only control input you provide is a button press.

Joystick Buttons:

<start> button: Start the robot
Any other button: Stop the robot

Control Method B (LILA)

Broad Summary: You have two control inputs. You can control the input on the right toggle on the joystick. You can press the button X to return home.

Joystick Buttons:

Toggle: Moves in 2 degrees of freedom
B: Open/Close Gripper (binary value, just “press” not hold!)
X: Return robot arm to home state (and provide new instruction)
Start: End/Terminate the Session.

Tips for Control:

When “dropping” objects, you do not need to wait for the object to be perfectly close to the target - rely on gravity! :)
Be gentle near hard objects can result in an automatic fail of task completion
For gripping, make sure the robot’s gripper is fully surrounding the area you wish to grasp before pressing B.
You can return home to X at any time.
Please attempt the task in good faith! If you want to play around with inputs for fun, we can do so after :D

Control Method C (LILAC)

Broad Summary: This is the same as Method B but you get to provide language corrections whenever you want. You can control the input on the right toggle on the joystick. If you get stuck at any point during the task, you can press “A” to provide a correction (see list of corrections below). Once you press “A”, tell the proctor the language correction you wish to provide. After the proctor types in the correction, you can control the input on the right toggle on the joystick. Once you are done with the spoken correction, you can press “Y” to indicate the end of the correction. This will revert to the previous language instruction, and you can again control the input on the right toggle on the joystick. You can also press X at any time to return home.

Note: There is no limit to the amount of corrections that you can provide so feel free to enter as many corrections as you would like!

Corrections:

Moving in any direction (up/down/left/right/back/forward), tilt (up, down, left, right), and twist (left, right).
Moving relative to known objects on the table: book, bookshelf, marker, marker holder, drawers of the shelf.
Provide your correction using language (e.g., say “move left”)

Joystick Buttons:

Toggle: Moves in 2 degrees of freedom
A: Indicate that you want to provide a correction
B: Open/Close Gripper (binary value, just “press” not hold!)
X: Return robot arm to home state (and provide new instruction)
Y: Indicate that you want to end the current correction and revert to the previous language instruction.
Start: End/Terminate the Session.