MimicGen
A Data Collection System for Scalable Robot Learning using Human Demonstrations

Overview Video

MimicGen Overview

MimicGen is a system for automatically generating large-scale, rich datasets from only a small number (~10) of human demonstrations by adapting them to new contexts.

MimicGen produces large-scale datasets with minimal human effort

We used MimicGen to generate over 50,000 demonstrations from less than 200 human demonstrations across 18 tasks, multiple simulators, and the real-world. This took considerably less human effort than prior work.

MimicGen can generate diverse datasets from 10 human demos including...

New Reset Distributions

In the example below, MimicGen generated 1000 demos for each of 3 reset distributions from just 10 human demos.

10 Human Demos

Generated Dataset
(Nominal Variant)

Generated Dataset
(Greater Variability)

Generated Dataset
(Greatest Variability)

We showcase several datasets generated by MimicGen across broad task reset distributions below.

Three Piece Assembly

Square

Stack Three

Coffee

Threading

Mug Cleanup

New Objects

10 human demos

1000 generated demos across 12 mugs

New Robot Hardware

10 human demos (Panda)

1000 generated demos (Sawyer)

1000 generated demos (IIWA)

1000 generated demos (UR5e)

Long-Horizon Tasks

Coffee Preparation

Kitchen

Pick Place

High-Precision Tasks

Gear Assembly

Frame Assembly

Mobile Manipulation

Mobile Kitchen

Real-World Tasks

200 MimicGen demos on Stack in large region

100 MimicGen demos on Coffee in large region

MimicGen datasets can produce performant policies across diverse tasks with simple Behavioral Cloning

Stack Three D1 (91%)

Coffee D1 (93%)

Threading D1 (80%)

Square D1 (69%)

Three Piece Assembly D1 (61%)

Mug Cleanup O2 (67%)

Kitchen D1 (78%)

Coffee Preparation D1 (59%)

Mobile Kitchen D0 (77%)

Nut-and-Bolt Assembly D1 (96%)

Gear Assembly D1 (76%)

Frame Assembly D1 (71%)

Additional Results and Visualizations

MimicGen can produce good quality datasets and policies across different quality human operators.

Below, we show MimicGen datasets generated on Square D2 from two sets of source datasets - 10 demos from a better quality operator and 10 demos from a worse quality operator (both are from the robomimic Square MH dataset). Surprisingly, policies trained on each dataset achieve comparable results, which suggests that in the large-scale data regime, data quality might not matter as much.

better_operator data.mp4

Dataset generated from 10 demos from a better quality operator.

worse_operator_data.mp4

Dataset generated from 10 demos from a worse quality operator.

Using MimicGen to generate equal amounts of data as a human operator (e.g. 200 demos generated from 10 human vs. 200 human demos) can result in comparable policy performance.

Below, we show policies trained on 200 demos on Square D0 -- on the left, the policy was trained on 200 demos generated by MimicGen from 10 human demos, and on the right, the policy was trained on 200 human demos. The agent performance is comparable. This raises important questions about the presence of redundancies in large human datasets and when to request additional data from a human.

square_D0_policy_worse.mp4

Square D0 Policy (90.7%) trained on 200 MimicGen demos generated from 10 human demos.

square_D0_policy_better.mp4

Square D0 Policy (90.7%) trained on 200 human demos.