Charles Sun*, Jędrzej Orbik*, Coline Devin, Brian Yang, Abhishek Gupta, Glen Berseth, and Sergey Levine
Berkeley AI Research
*Denotes equal contribution
In this paper, we study how mobile manipulators can autonomously learn skills that require a combination of navigation and grasping. Learning robotic skills in the real world remains challenging without large scale data collection and super-vision. These difficulties have often been sidestepped by limiting the robot to only manipulation or navigation and by using human effort to provide demonstrations, task resets/randomizations, and data labeling during the training process. In this work, we specifically study how a robot can autonomously learn to clean indifferent rooms by collecting objects off the ground and putting them into a basket. Our goal is to enable a robot to learn this task autonomously under realistic settings, without environment instrumentation, with minimal human intervention, or access to privileged information, such as maps, objects positions, or a global view of the environment. We propose a novel learning system, ReLMM, that achieves this goal through a modularized policy for grasping and navigation where uncertainty over the grasping policy drives exploration, and the navigation is rewarded only by grasp success. We show that with ReLMM,after a brief pretraining phase, a robot can learn to navigate and clean up a room autonomously.
Our method for fully autonomous real world reinforcement learning consists of the following:
Decompose policy into grasping and navigation, and train both jointly with grasp success reward.
Train an ensemble of grasping policies for measuring uncertainty and focusing exploration in novel states.
Bootstrap grasping policy with either a stationary or automatic curriculum.
Run pseudo-reset behavior after grasps so that training can run autonomously.
Train policies directly from camera input, avoiding the need for maps, instrumentation, or human labelling.
As shown in the training videos, the only human interventions needed during training were swapping out batteries and moving objects away from ungraspable positions that they may have been pushed to (e.g. flush to the wall).
All the presented evaluations have been sped up 20x.
Random navigation & grasping
Random navigation
Scripted policy
ReLMM-StatCurr
ReLMM-AutoCurr
Random navigation & grasping
Random navigation
Scripted policy
ReLMM-StatCurr
ReLMM-AutoCurr
Random navigation & grasping
Random navigation
Scripted policy
ReLMM-StatCurr
Random navigation & grasping
Random navigation
Scripted policy
ReLMM-StatCurr