Fully Autonomous Real-World Reinforcement Learning with Applications to Mobile Manipulation
Charles Sun*, Jędrzej Orbik*, Coline Devin, Brian Yang, Abhishek Gupta, Glen Berseth, and Sergey Levine
Berkeley AI Research
*Denotes equal contribution
Abstract
In this paper, we study how mobile manipulators can autonomously learn skills that require a combination of navigation and grasping. Learning robotic skills in the real world remains challenging without large scale data collection and super-vision. These difficulties have often been sidestepped by limiting the robot to only manipulation or navigation and by using human effort to provide demonstrations, task resets/randomizations, and data labeling during the training process. In this work, we specifically study how a robot can autonomously learn to clean indifferent rooms by collecting objects off the ground and putting them into a basket. Our goal is to enable a robot to learn this task autonomously under realistic settings, without environment instrumentation, with minimal human intervention, or access to privileged information, such as maps, objects positions, or a global view of the environment. We propose a novel learning system, ReLMM, that achieves this goal through a modularized policy for grasping and navigation where uncertainty over the grasping policy drives exploration, and the navigation is rewarded only by grasp success. We show that with ReLMM,after a brief pretraining phase, a robot can learn to navigate and clean up a room autonomously.
Video summary of results
Method
Our method for fully autonomous real world reinforcement learning consists of the following:
Decompose policy into grasping and navigation, and train both jointly with grasp success reward.
Train an ensemble of grasping policies for measuring uncertainty and focusing exploration in novel states.
Bootstrap grasping policy with either a stationary or automatic curriculum.
Run pseudo-reset behavior after grasps so that training can run autonomously.
Train policies directly from camera input, avoiding the need for maps, instrumentation, or human labelling.
Training (no obstacles room), 20x speed
Training (obstacles room), 20x speed
As shown in the training videos, the only human interventions needed during training were swapping out batteries and moving objects away from ungraspable positions that they may have been pushed to (e.g. flush to the wall).
Evaluations
All the presented evaluations have been sped up 20x.
Room without obstacles
Random navigation & grasping
Random navigation
Scripted policy
ReLMM-StatCurr
ReLMM-AutoCurr
Diverse objects
Random navigation & grasping
Random navigation
Scripted policy
ReLMM-StatCurr
ReLMM-AutoCurr
Room with obstacles
Random navigation & grasping
Random navigation
Scripted policy
ReLMM-StatCurr
Room with obstacles and rugs
Random navigation & grasping
Random navigation
Scripted policy
ReLMM-StatCurr