Learning Language-Conditioned Robot Behavior from Offline Data and Crowd-Sourced Annotation

Suraj Nair, Eric Mitchell, Kevin Chen, Brian Ichter, Silvio Savarese, Chelsea Finn

Stanford University | Robotics at Google


Paper / Code and Data

Our goal is to learn language-conditioned visuomotor skills on real robots.

To do so we:

  • Label highly sub-optimal offline robot data (including autonomous exploration data or replay buffers of previously trained RL agents) with crowd-sourced natural language annotations.

  • Learn (1) a language-conditioned reward function from the annotated data and (2) a visual dynamics model from the offline data and actions.

  • Perform model predictive control with the learned dynamics and reward to complete language specified tasks from visual inputs.


Summary Video

lorel_5min_final.mp4

Qualitative Results

"Move the stapler"

0705_newview_movestapler.mp4

"Open the left drawer"

0705_newview_openleft.mp4

"Open the right drawer"

0705_newview_openright.mp4

"Reach the marker"

0705_newview_marker.mp4

"Reach the cabinet"

0705_newview_cabinet.mp4

Qualitative Results (Rephrased Instruction)

"Push the small gray stapler around on top of the black desk"

0705_newview_movestaplerlong.mp4

"Open the small black and white drawer on the left fully"

0705_newview_openleftlong.mp4