BC-Z: Zero-Shot Task Generalization with Robotic Imitation Learning

Eric Jang, Alex Irpan, Mohi Khansari, Daniel Kappler,

Frederik Ebert, Corey Lynch, Sergey Levine, Chelsea Finn

We study the problem of enabling a vision-based robotic manipulation system to generalize to novel tasks, a long-standing challenge in robot learning. We approach the challenge from an imitation learning perspective, aiming to study how scaling and broadening the data collected can facilitate such generalization. To that end, we develop an interactive and flexible imitation learning system that can learn from both demonstrations and interventions and can be conditioned on different forms of information that convey the task, including pre-trained embeddings of natural language or videos of humans performing the task.

Training Tasks

We collect a large-scale VR-teleoperated dataset of demonstrations for 100 manipulation tasks, and train a convolutional neural network to imitate closed-loop actions from RGB pixel observations. The images below are of a single neural network performing a variety of behaviors conditioned on natural language instructions.

"Place Banana In Purple Bowl"

"Place Bottle Upright"

"Place The Ceramic Cup Over The Eraser"

"Place The Pepper in the Ceramic Cup"

Generalization to New Language & Video Instructions

When scaling data collection on a real robot to more than 100 distinct tasks, we find that this system can perform 21 manipulation tasks with an average success rate of 44%, without any robot demonstrations for those tasks. Below are some videos of the robot performing instructions it has never been trained to do:

"Place Grapes in Ceramic Bowl"

"Place Bottle In Tray"

"Push Purple Bowl Across The Table"

"Wipe Tray With Sponge"

Video Overview



title={{BC}-Z: Zero-Shot Task Generalization with Robotic Imitation Learning},

author={Eric Jang and Alex Irpan and Mohi Khansari and Daniel Kappler and Frederik Ebert and Corey Lynch and Sergey Levine and Chelsea Finn},

booktitle={5th Annual Conference on Robot Learning},



Scaling Up Task Complexity

  1. Variations in object positions

  2. Variations in scene background due to collecting in multiple locations

  3. Minor hardware differences between each robot

  4. Variation in object instances

  5. Multiple distractor objects (4-5)

  6. Closed-loop, RGB only visuomotor control at 10Hz asynchronous inference. This results in well over 100 decisions per episode (i.e. long-horizon tasks that would be challenging for sparse RL objectives)

Other Tasks

Door Opening

We use BC-Z to open-doors by driving the base

Bin Emptying

Single-task policy for clearing rubbish from a bin


Eric Jang (1*), Alex Irpan (1*), Mohi Khansari (2), Daniel Kappler (2), Frederik Ebert (3†), Corey Lynch (1), Sergey Levine (1,3), Chelsea Finn (1,4)

1: Robotics at Google. 2: X, The Moonshot Factory. 3 : University of California, Berkeley. 4: Stanford University

*: Equal Contribution

†: Work done while author was at Google