Error-Aware Imitation Learning from Teleoperation Data for Mobile Manipulation
Josiah Wong, Albert Tung, Andrey Kurenkov, Ajay Mandlekar,
Li Fei-Fei, Silvio Savarese, Roberto Martín-Martín
Video Summary
Overview
Motivation
Mobile Manipulation (MM) endows robots with greater capabilities compared to static manipulation or navigation, but suffers from a vast state space that is difficult to explore
Imitation Learning (IL) is a promising paradigm that has shown potential in static manipulation and navigation, but has yet to be rigorously explored in general settings
We observe two key challenges limiting IL from being applied to MM:
(1) Lack of intuitive demonstration interface -- Prior MM interfaces for collecting demos are few, and are complex to use. Allowing an operator to naturally control a mobile manipulator's locomotive AND manipulative capabilities is non-trivial.
(2) Vast State Space -- It is unlikely that IL can sufficiently cover all states for robust policy learning in the MM domain. A policy trained on a finite demonstration dataset may likely suffer from covariate shift during rollouts.
Our Contributions
MoMaRT. We present a novel teleoperation system that enables intuitive and expressive teleoperation of mobile manipulation robots
First-of-its-kind Dataset. We collect a novel continuous control dataset in a realistic simulated kitchen domain consisting of over 1200 successful demonstrations across five long-horizon tasks with multi-sensor modalities and ablation subsets with domain randomization
Performant Policies + Error Detectors. We train performant IL task policies that reach over 45% success across all tasks, and augment these policies with a learned error detector model that can accurately detect when the agent is in a failure state and immediately terminate, achieving over 85% precision and recall
MoMaRT: Mobile Manipulation RoboTurk
Controlling mobile manipulators can be easy
MoMaRT enables intuitive control all the degrees of freedom of a mobile manipulator, allowing a user to directly teleoperate a mobile manipulator with their smartphone. We assume the mobile manipulator consists of a base and arm, such as the Fetch or PAL Tiago robots. We highlight the key features below:
Synchronous arm and base control: A user can move the base by moving the virtual joystick while simultaneously controlling the arm through the smartphone's 6DOF motion.
Realistic Teleoperation: The operator sees what the robot sees, and is constrained to the robot's head camera for visual feedback. This guarantees that the demonstrator and downstream learned agent share the same partial observability constraints, which can be useful for learning.
Passive Head Control: The head is fixed horizontally and moves vertically to maintain the end effector in-frame. This reduces the operator cognitive burden while also making learning easier due to the reduced action space.
Reset Action: An operator can press a button to force the arm to return to a pre-defined default stable configuration. Because MM tasks can be long-horizon, it can be easy for the arm to end up in an unstable configuration. Including this action allows the operator to recover to well-known configurations between subtasks, which implicitly eases downstream learning by narrowing the distribution of arm configurations seen during demonstrations.
Large-Scale Mobile Manipulation Dataset
1200+ Demonstrations
11+ Hours
5 Long-Horizon Kitchen Tasks
Setup Table from Dishwasher
The robot must navigate to the dishwasher, grab the bowl, and place it on the table. This requires contact-rich, arm-base coordinated interaction of large constrained mechanisms
Setup Table from Dresser
Instead of grabbing the bowl from the dishwasher, the robot must search dresser drawers for the bowl; this evaluates an agent's ability to contextualize its observations on prior actions.
Table Cleanup to Dishwasher
The robot must pick up the dirty bowl from the table, dump the trash in the trash can, and put it in the dishwasher. Dumping the trash requires accurate arm positioning and large arm motion.
Table Cleanup to Sink
Instead of placing the dirty bowl in the dishwasher, the robot must navigate to the sink and place it in the basin. This segment evaluates an agent's ability to navigate a room.
Unload Dishwasher to Dresser
The robot must grab the clean bowl from the dishwasher and place it in the dresser. In addition to the other task properties mentioned, travelling from the dishwasher to dresser tests an agent's ability to avoid obstacles based on estimated visual states.
Diverse Demonstrations
Our demonstrations consist of multiple subsets: expert demonstrator trajectories, suboptimal demonstrator trajectories, and few-shot generalization trajectories. Demonstrator strategies can vary between the expert and suboptimal demonstrators, and showcase multiple solutions for solving the same task.
Expert Demonstrator Strategy
Approach the dresser from the side
Suboptimal Demonstrator Strategy
Approach the dresser from the front
Error Detection
Learning to know what to do is hard; learning when we don't know what to do is much easier
Detecting errors in MM is important for safety-critical applications, and often necessary because of the partial observability making robust policy learning difficult. We therefore propose a simple, but effective method for detecting out-of-distribution states. A conditional variational autoencoder (cVAE) learns to encode and decode the present visual state, conditioned on a past visual state.
This conditioning couples the likelihood of the present on the past, inducing a temporal dependency into the cVAE. During rollouts, we expect a low reconstruction error (epsilon) for all "good" states -- that is, states observed during training that correspond to successful task trajectories. In contrast, a high reconstruction error suggests that we haven't seen this state before, which likely corresponds to a "bad" state, i.e. failure state due to policy errors.
We directly leverage this error to infer what to do. For a given timestep, if the error is above a certain threshold, our error detector intervenes by first trying to recover, and bring the agent back to a state more likely to be seen before during training, and then immediately terminating if the recovery is unsuccessful. These intervention actions are proactive, and can mitigate undefined or potentially destructive policy behavior once detected.
Imitation Learning Can Solve Our Tasks
When trained on expert data, our learned policies can already succeed, achieving 48-68% success. This shows IL is viable for MM.
Setup Table from Dishwasher
66%
66%
Setup Table from Dresser
68%
68%
Table Cleanup to Dishwasher
48%
48%
Table Cleanup to Sink
61%
61%
Unload Dishwasher to Dresser
48%
48%
Error Detectors Can Consistently Detect Out-of-Distribution States
When trained on expert data, our learned error detectors can achieve over 85% precision (P) and recall (R). This shows that our error detectors can reliably detect errors despite the vast state space
Setup Table from Dishwasher
88% P, 100% R
88% P, 100% R
Setup Table from Dresser
100% P, 86% R
100% P, 86% R
Table Cleanup to Dishwasher
96% P, 100% R
96% P, 100% R
Table Cleanup to Sink
97% P, 87% R
97% P, 87% R
Unload Dishwasher to Dresser
86% P, 100% R
86% P, 100% R
Qualitative Analysis
We showcase a comparison between our learned policy without an error detector (left), and with an error detector (second from left), as well as a visualization of the error detector's reconstructed state (second from right) and corresponding reconstruction error (right). Our error detector can reliably terminate execution during failure modes, and surprisingly, sometimes even recover to solve the task where the original policy would have failed!
Terminate
Policy (no Error Detector)
Policy + Error Detector
Reconstructed State
Reconstruction Error
Recover
Policy (no Error Detector)
Policy + Error Detector
Reconstructed State
Reconstruction Error
Error Detectors Can Generalize In Few-Shot Setting
After relocating key furniture items, our error detectors can generalize, and usually achieve over 78% precision (P) and recall (R) . This shows that error detection can also be applied to generalization settings with only a small number of additional samples.
Policy
Setup Table from Dishwasher
66%
66%
Setup Table from Dishwasher (Moved Dishwasher)
24%
24%
Error Detection
Setup Table from Dishwasher
88% P, 100% R
88% P, 100% R
Setup Table from Dishwasher (Moved Dishwasher)
78% P, 100% R
78% P, 100% R