A critical challenge for robotic learning is scale: acquiring a large enough dataset for the system to generalize effectively
Imitation learning is a stable and powerful approach for robot learning, but requires human operators for data collection
In this work, we aim to build an imitation learning system that is capable of
Improving through autonomously collected data
Avoids the explicit use of reinforcement learning or task-specific rewards
Our method, which we call MILI, utilizes the following insight: In a multi-task setting, a failed attempt at one task might represent a successful attempt at another task. This allows use to rollouts from the learned meta-policy itself for improving its performance.
We start with a multi-task dataset of human demonstrations, and apply a meta-imitation learning algorithm to learn a demo-conditioned policy.
We use the same initially provided dataset to also learn a task embedding model. We use a contrastive loss to push demos from the same task close together in the latent space, and demos from different tasks are pushed apart.
We use the meta-policy learned in step 1 to collect trials new environments by conditioning it on random demonstrations from the original dataset. We embed these trials into the latent task space using the embedding model learned in step 2.
In order to utilize the trials to learn an improved meta-policy, we need to organize them into tasks. We do so by finding trials that are close to each other in the latent space, and grouping them to generate new task. We then perform meta-imitation on this new dataset, in addition to the original human demo dataset.
The MILI algorithm We bootstrap a one-shot imitation policy using a multi-task imitation learning dataset. We then use this policy to collect trials in new environments. A latent task space, learned using the same initial dataset, is then used to find similarities in the collected trials, and generate new tasks for meta-imitation learning. We update our meta-policy using the newly collected data, and repeat this process until convergence.