Human-in-the-Loop Imitation Learning using Remote Teleoperation
Video Summary
Motivation
Imitation Learning suffers from covariate shift
Small action errors can lead to unseen states, causing failure.
Intervention-based Policy Learning
Allowing a human to intervene during policy rollouts and provide corrections can mitigate these issues. However, most prior methods have been limited to 2D driving domains, which are substantially more tolerant to error than manipulation.
Bottlenecks in Manipulation Tasks
![](https://www.google.com/images/icons/product/drive-32.png)
Human Demonstration
Insertion requires a precise sequence of actions - we call such regions of the state space bottlenecks.
![](https://www.google.com/images/icons/product/drive-32.png)
Policy Execution
Small deviations near bottleneck regions can cause a trained policy to fail.
Contributions
We develop a system that enables remote teleoperation for 6-DoF robot control and a natural human intervention mechanism well suited to robot manipulation.
We introduce Intervention Weighted Regression (IWR), a simple yet effective method to learn from human interventions that encourages the policy to learn how to traverse bottlenecks through the interventions.
We evaluate our system and method on two challenging contact-rich manipulation tasks: a threading task and coffee machine task. We demonstrate that (1) policies trained on data collected by our system outperform policies trained on an equivalent amount of full human demonstration trajectories, (2) IWR outperforms alternatives for learning from the intervention data, and (3) our results hold across data collected from multiple human operators.
Remote Teleoperation for Collecting Interventions
Our system allows operators to remotely monitor trained policies and intervene when necessary. An operator only needs a smartphone and a web browser to participate in data collection. The operator watches the trained policy in a video stream until they decide to intervene. During an intervention, they move their phone in free space to apply relative pose commands to the robot arm. This provides a natural way for users to apply corrections.
Intervention Weighted Regression (IWR)
Our method partitions the collected data into intervention and non-intervention samples, and then samples them during training in equal proportion. This effectively re-weights the data to prioritize interventions while regularizing the policy to stay close to the policy used for data collection.
We use IWR iteratively to alternate between data collection with a human and the latest policy iteration and training with IWR to update the policy.
IWR algorithm block
Multi-Stage Manipulation Tasks with Bottlenecks
![](https://www.google.com/images/icons/product/drive-32.png)
Threading
The robot must thread a rod into a wooden ring. The task contains two bottlenecks - grasping the rod and inserting the rod in the ring. The insertion needs to be performed carefully - the ring can move easily if the rod hits the ring.
![](https://www.google.com/images/icons/product/drive-32.png)
Coffee Machine
The robot must prepare a cup of coffee. The task contains three bottlenecks - grasping the pod, fitting it into the machine, and closing the lid. The pod grasping and insertion require careful precision - small errors can cause the pod to slip out of the hand or fail to be inserted into the machine.
The robot needs to generalize to a diverse distribution of task instances.
![](https://www.google.com/images/icons/product/drive-32.png)
![](https://www.google.com/images/icons/product/drive-32.png)
Data Collection Study
We collected data across 3 different operators - they differed in skill level and produced diverse quality data.
![](https://www.google.com/images/icons/product/drive-32.png)
Operator 1 (Experienced)
![](https://www.google.com/images/icons/product/drive-32.png)
Operator 3 (Inexperienced)
Intervention data outperforms full human demonstrations
IWR outperforms other intervention-based algorithms
IWR can learn from data collected by other intervention-based algorithms
Common Mistakes and Corrections
![](https://www.google.com/images/icons/product/drive-32.png)
![](https://www.google.com/images/icons/product/drive-32.png)
Most mistakes and corrections occur near bottleneck regions.
Qualitative Policy Performance
![](https://www.google.com/images/icons/product/drive-32.png)
![](https://www.google.com/images/icons/product/drive-32.png)