Fully Autonomous Real-World Reinforcement Learning with Applications to Mobile Manipulation

Charles Sun*, Jędrzej Orbik*, Coline Devin, Brian Yang, Abhishek Gupta, Glen Berseth, and Sergey Levine

Berkeley AI Research

*Denotes equal contribution


Paper: https://arxiv.org/abs/2107.13545

Blog post: https://bair.berkeley.edu/blog/2023/01/20/relmm/

Abstract

In this paper, we study how mobile manipulators can autonomously learn skills that require a combination of navigation and grasping. Learning robotic skills in the real world remains challenging without large scale data collection and super-vision. These difficulties have often been sidestepped by limiting the robot to only manipulation or navigation and by using human effort to provide demonstrations, task resets/randomizations, and data labeling during the training process. In this work, we specifically study how a robot can autonomously learn to clean indifferent rooms by collecting objects off the ground and putting them into a basket. Our goal is to enable a robot to learn this task autonomously under realistic settings, without environment instrumentation, with minimal human intervention, or access to privileged information, such as maps, objects positions, or a global view of the environment. We propose a novel learning system, ReLMM, that achieves this goal through a modularized policy for grasping and navigation where uncertainty over the grasping policy drives exploration, and the navigation is rewarded only by grasp success. We show that with ReLMM,after a brief pretraining phase, a robot can learn to navigate and clean up a room autonomously.

relmm_video.m4v

Video summary of results


Method

Our method for fully autonomous real world reinforcement learning consists of the following:

  • Decompose policy into grasping and navigation, and train both jointly with grasp success reward.

  • Train an ensemble of grasping policies for measuring uncertainty and focusing exploration in novel states.

  • Bootstrap grasping policy with either a stationary or automatic curriculum.

  • Run pseudo-reset behavior after grasps so that training can run autonomously.

  • Train policies directly from camera input, avoiding the need for maps, instrumentation, or human labelling.


Training (no obstacles room), 20x speed

run_speedup_20.webm

Training (obstacles room), 20x speed

obst_3_sock2000_run_speedup20.webm

As shown in the training videos, the only human interventions needed during training were swapping out batteries and moving objects away from ungraspable positions that they may have been pushed to (e.g. flush to the wall).

Evaluations

All the presented evaluations have been sped up 20x.

Room without obstacles

Rand all.mp4

Random navigation & grasping

Rand nav.mp4

Random navigation

Scripted.mp4

Scripted policy

ReLMM-StatCurr.mp4

ReLMM-StatCurr

ReLMM-AutoCurr.mp4

ReLMM-AutoCurr

Diverse objects

Rand all.mp4

Random navigation & grasping

Rand nav.mp4

Random navigation

Scripted.mp4

Scripted policy

ReLMM-StatCurr.mp4

ReLMM-StatCurr

ReLMM-AutoCurr.mp4

ReLMM-AutoCurr

Room with obstacles

Random all.mp4

Random navigation & grasping

Random nav.mp4

Random navigation

Scripted.mp4

Scripted policy

ReLMM-StatCurr.mp4

ReLMM-StatCurr

Room with obstacles and rugs

Random all.mp4

Random navigation & grasping

Random nav.mp4

Random navigation

Scripted.mp4

Scripted policy

eval_2_x20.mp4

ReLMM-StatCurr