Diffusion Meets DAgger
Supercharging Eye-in-hand Imitation Learning
Xiaoyu Zhang Matthew Chang Pranav Kumar Saurabh Gupta
University of Illinois at Urbana-Champaign
Abstract
A common failure mode for policies trained with imitation is compounding execution errors at test time. When the learned policy encounters states that are not present in the expert demonstrations, the policy fails, leading to degenerate behavior. The Dataset Aggregation, or DAgger approach to this problem simply collects more data to cover these failure states. However, in practice, this is often prohibitively expensive. In this work, we propose Diffusion Meets DAgger (DMD), a method to reap the benefits of DAgger without the cost for eye-in-hand imitation learning problems. Instead of collecting new samples to cover out-of-distribution states, DMD uses recent advances in diffusion models to synthesize these samples. This leads to robust performance from few demonstrations. We compare DMD against behavior cloning baseline across four tasks: pushing, stacking, pouring, and shirt hanging. In pushing, DMD achieves 80% success rate with as few as 8 expert demonstrations, where naive behavior cloning reaches only 20%. In stacking, DMD succeeds on average 92% of the time across 5 cups, versus 40% for BC. When pouring coffee beans, DMD transfers to another cup successfully 80% of the time. Finally, DMD attains 90% success rate for hanging shirt on a clothing rack.
DMD System Overview. Our system operates in three stages.
a) A diffusion model is trained, using task and play data, to synthesize novel views relative to a given image.
b) This diffusion model is used to generate an augmenting dataset that contains off-trajectory views from expert demonstrations. Labels for these views (cyan arrows) are constructed such that off-trajectory views will still converge towards task success (right). Images with a green border are from trajectories in the original task dataset. Purple-outlined images are diffusion-generated augmenting samples.
c) The original task data and augmenting dataset are combined for policy learning.
DMD Robotic Experiments
There are four tasks in total: pushing an apple to a target location, stacking five different cups on a box, pouring coffee beans into a cup, and hanging a shirt on a rack. We conduct our experiments on a Franka Research 3 robot with a wrist-mounted GoPro Hero 9.
Non-prehensile Pushing
(a) DMD vs. BC
DMD outperforms BC across all settings. DMD achieves a 100% success rate when pushing an apple, greatly exceeding BC’s 30%. It also maintains an 80% success rate with only 8 demonstrations, whereas BC drops to 20%.
*This video contains multiple sections for different experiments
*This video contains multiple sections for different experiments
(b) DMD vs. SPARTN
Our diffusion model synthesizes higher quality images than NeRFs, especially when scenes undergo deformations. This advantage results in higher task performance: DMD achieves a 100% success rate, while SPARTN achieves only 50%.
(Note that this DMD-24-demos video is different from the DMD-24-demos video bove because they are from two different pairwise randomized A/B tests.)
(c) Utility of Play Data
Training the diffusion model with additional play data boosts the task success rate to 100%, compared to 80% when using the model trained only on task data.
(Note that this DMD-24-demos video is different from the previous DMD-24-demos videos because they are all from different pairwise randomized A/B tests.)
Task & Play Data
Only Task Data
Stacking
*This video contains multiple sections for different experiments
*This video contains multiple sections for different experiments
Pouring
*Recorded with a third-person camera to view amount of coffee beans transferred/spilled better. This view is not input into the policy.
*Recorded with a third-person camera to view amount of coffee beans transferred/spilled better. This view is not input into the policy.
Hanging a Shirt
*Recorded with a third-person camera to view amount of coffee beans transferred/spilled better. This view is not input into the policy.
*Recorded with a third-person camera to view amount of coffee beans transferred/spilled better. This view is not input into the policy.
In-the-Wild Cup Arrangement
We leverage a diverse in-the-wild dataset from the recent Universal Manipulation Interface (UMI) paper. We adopt the same task definition as the in-the-wild generalization experiment in UMI: placing a cup on a saucer with its handle facing the left side of the robot. UMI collected 1447 demonstrations across 30 locations and 18 training cups.
We use their publicly available demonstration data and conduct evaluation in our lab (i.e. novel location) with and without DMD. We test on 5 held-out cups. For each cup, we test 5 different start configurations. We follow the experiment protocol outlined in UMI: we use pixel masks to make sure that the starting locations of the cups and saucers are the same across the two methods.
DMD
Diffusion Policy