Learning Language-Conditioned Robot Behavior from Offline Data and Crowd-Sourced Annotation
Suraj Nair, Eric Mitchell, Kevin Chen, Brian Ichter, Silvio Savarese, Chelsea Finn
Stanford University | Robotics at Google
Our goal is to learn language-conditioned visuomotor skills on real robots.
To do so we:
Label highly sub-optimal offline robot data (including autonomous exploration data or replay buffers of previously trained RL agents) with crowd-sourced natural language annotations.
Learn (1) a language-conditioned reward function from the annotated data and (2) a visual dynamics model from the offline data and actions.
Perform model predictive control with the learned dynamics and reward to complete language specified tasks from visual inputs.
Summary Video
lorel_5min_final.mp4
Qualitative Results
"Move the stapler"
0705_newview_movestapler.mp4
"Open the left drawer"
0705_newview_openleft.mp4
"Open the right drawer"
0705_newview_openright.mp4
"Reach the marker"
0705_newview_marker.mp4
"Reach the cabinet"
0705_newview_cabinet.mp4
Qualitative Results (Rephrased Instruction)
"Push the small gray stapler around on top of the black desk"
0705_newview_movestaplerlong.mp4
"Open the small black and white drawer on the left fully"
0705_newview_openleftlong.mp4