Error-Aware Imitation Learning from Teleoperation Data for Mobile Manipulation

Josiah Wong, Albert Tung, Andrey Kurenkov, Ajay Mandlekar,

Li Fei-Fei, Silvio Savarese, Roberto Martín-Martín

Video Summary

Overview

Motivation

  • Mobile Manipulation (MM) endows robots with greater capabilities compared to static manipulation or navigation, but suffers from a vast state space that is difficult to explore

  • Imitation Learning (IL) is a promising paradigm that has shown potential in static manipulation and navigation, but has yet to be rigorously explored in general settings

  • We observe two key challenges limiting IL from being applied to MM:

  • (1) Lack of intuitive demonstration interface -- Prior MM interfaces for collecting demos are few, and are complex to use. Allowing an operator to naturally control a mobile manipulator's locomotive AND manipulative capabilities is non-trivial.

  • (2) Vast State Space -- It is unlikely that IL can sufficiently cover all states for robust policy learning in the MM domain. A policy trained on a finite demonstration dataset may likely suffer from covariate shift during rollouts.

RT_IG_Pull.mp4

Our Contributions

  1. MoMaRT. We present a novel teleoperation system that enables intuitive and expressive teleoperation of mobile manipulation robots

  2. First-of-its-kind Dataset. We collect a novel continuous control dataset in a realistic simulated kitchen domain consisting of over 1200 successful demonstrations across five long-horizon tasks with multi-sensor modalities and ablation subsets with domain randomization

  3. Performant Policies + Error Detectors. We train performant IL task policies that reach over 45% success across all tasks, and augment these policies with a learned error detector model that can accurately detect when the agent is in a failure state and immediately terminate, achieving over 85% precision and recall

MoMaRT: Mobile Manipulation RoboTurk

RT_Teleop_Overlay.mp4

Controlling mobile manipulators can be easy

MoMaRT enables intuitive control all the degrees of freedom of a mobile manipulator, allowing a user to directly teleoperate a mobile manipulator with their smartphone. We assume the mobile manipulator consists of a base and arm, such as the Fetch or PAL Tiago robots. We highlight the key features below:

  • Synchronous arm and base control: A user can move the base by moving the virtual joystick while simultaneously controlling the arm through the smartphone's 6DOF motion.

  • Realistic Teleoperation: The operator sees what the robot sees, and is constrained to the robot's head camera for visual feedback. This guarantees that the demonstrator and downstream learned agent share the same partial observability constraints, which can be useful for learning.

  • Passive Head Control: The head is fixed horizontally and moves vertically to maintain the end effector in-frame. This reduces the operator cognitive burden while also making learning easier due to the reduced action space.

  • Reset Action: An operator can press a button to force the arm to return to a pre-defined default stable configuration. Because MM tasks can be long-horizon, it can be easy for the arm to end up in an unstable configuration. Including this action allows the operator to recover to well-known configurations between subtasks, which implicitly eases downstream learning by narrowing the distribution of arm configurations seen during demonstrations.

Large-Scale Mobile Manipulation Dataset

1200+ Demonstrations

11+ Hours

5 Long-Horizon Kitchen Tasks

set_table_dishwasher_expert_demo_playback_external.mp4

Setup Table from Dishwasher

The robot must navigate to the dishwasher, grab the bowl, and place it on the table. This requires contact-rich, arm-base coordinated interaction of large constrained mechanisms

set_table_drawer_expert_demo_playback_external.mp4

Setup Table from Dresser

Instead of grabbing the bowl from the dishwasher, the robot must search dresser drawers for the bowl; this evaluates an agent's ability to contextualize its observations on prior actions.

table_cleanup_dishwasher_expert_demo_playback_external.mp4

Table Cleanup to Dishwasher

The robot must pick up the dirty bowl from the table, dump the trash in the trash can, and put it in the dishwasher. Dumping the trash requires accurate arm positioning and large arm motion.

table_cleanup_sink_expert_demo_playback_external.mp4

Table Cleanup to Sink

Instead of placing the dirty bowl in the dishwasher, the robot must navigate to the sink and place it in the basin. This segment evaluates an agent's ability to navigate a room.

unload_dishwasher_expert_demo_playback_external.mp4

Unload Dishwasher to Dresser

The robot must grab the clean bowl from the dishwasher and place it in the dresser. In addition to the other task properties mentioned, travelling from the dishwasher to dresser tests an agent's ability to avoid obstacles based on estimated visual states.

Diverse Demonstrations

Our demonstrations consist of multiple subsets: expert demonstrator trajectories, suboptimal demonstrator trajectories, and few-shot generalization trajectories. Demonstrator strategies can vary between the expert and suboptimal demonstrators, and showcase multiple solutions for solving the same task.

set_table_drawer_expert_demo_playback_external.mp4

Expert Demonstrator Strategy

Approach the dresser from the side

set_table_drawer_demonstrator2_demo_playback_external.mp4

Suboptimal Demonstrator Strategy

Approach the dresser from the front

Error Detection

Learning to know what to do is hard; learning when we don't know what to do is much easier

Detecting errors in MM is important for safety-critical applications, and often necessary because of the partial observability making robust policy learning difficult. We therefore propose a simple, but effective method for detecting out-of-distribution states. A conditional variational autoencoder (cVAE) learns to encode and decode the present visual state, conditioned on a past visual state.

This conditioning couples the likelihood of the present on the past, inducing a temporal dependency into the cVAE. During rollouts, we expect a low reconstruction error (epsilon) for all "good" states -- that is, states observed during training that correspond to successful task trajectories. In contrast, a high reconstruction error suggests that we haven't seen this state before, which likely corresponds to a "bad" state, i.e. failure state due to policy errors.

We directly leverage this error to infer what to do. For a given timestep, if the error is above a certain threshold, our error detector intervenes by first trying to recover, and bring the agent back to a state more likely to be seen before during training, and then immediately terminating if the recovery is unsuccessful. These intervention actions are proactive, and can mitigate undefined or potentially destructive policy behavior once detected.

Imitation Learning Can Solve Our Tasks

When trained on expert data, our learned policies can already succeed, achieving 48-68% success. This shows IL is viable for MM.

pp_set_table_dishwasher_success.mov

Setup Table from Dishwasher
66%

pp_set_table_drawer_success.mp4

Setup Table from Dresser
6
8%

pp_table_cleanup_dishwasher_success.mov

Table Cleanup to Dishwasher
48%

pp_table_cleanup_sink_success.mov

Table Cleanup to Sink
6
1%

pp_unload_dishwasher_success.mov

Unload Dishwasher to Dresser
48%

Error Detectors Can Consistently Detect Out-of-Distribution States

When trained on expert data, our learned error detectors can achieve over 85% precision (P) and recall (R). This shows that our error detectors can reliably detect errors despite the vast state space

pp_set_table_dishwasher_errors_updated.mov

Setup Table from Dishwasher
88% P, 100% R

pp_set_table_drawer_errors_updated.mov

Setup Table from Dresser
100% P, 86% R

pp_table_cleanup_dishwasher_errors_updated.mp4

Table Cleanup to Dishwasher
96% P, 100% R

pp_table_cleanup_sink_errors_updated.mp4

Table Cleanup to Sink
97% P, 87% R

pp_unload_dishwasher_errors_updated.mov

Unload Dishwasher to Dresser
8
6% P, 100% R

Qualitative Analysis

We showcase a comparison between our learned policy without an error detector (left), and with an error detector (second from left), as well as a visualization of the error detector's reconstructed state (second from right) and corresponding reconstruction error (right). Our error detector can reliably terminate execution during failure modes, and surprisingly, sometimes even recover to solve the task where the original policy would have failed!

Terminate

Policy (no Error Detector)

Policy + Error Detector

Reconstructed State

Reconstruction Error

set_table_drawer_errors_updated.mov

Recover

Policy (no Error Detector)

Policy + Error Detector

Reconstructed State

Reconstruction Error

set_table_drawer_recovery.mov_postprocessed.mp4

Error Detectors Can Generalize In Few-Shot Setting

After relocating key furniture items, our error detectors can generalize, and usually achieve over 78% precision (P) and recall (R) . This shows that error detection can also be applied to generalization settings with only a small number of additional samples.

Policy

pp_set_table_dishwasher_errors_updated.mov

Setup Table from Dishwasher
66%

pp_set_table_dishwasher_generalize_errors_updated.mov

Setup Table from Dishwasher (Moved Dishwasher)
24%

Error Detection

pp_set_table_dishwasher_success.mov

Setup Table from Dishwasher
88% P, 100% R

pp_set_table_dishwasher_generalize_success.mov

Setup Table from Dishwasher (Moved Dishwasher)
78% P, 100% R