Reset-Free Reinforcement Learning via Multi-Task Learning:
Learning Dexterous Manipulation Behaviors without Human Intervention

Abhishek Gupta*, Justin Yu*, Tony Z. Zhao*, Vikash Kumar*, Aaron Rovinsky, Kelvin Xu, Thomas Devlin, Sergey Levine

UC Berkeley and University of Washington

International Conference on Robotics and Automation (ICRA) 2021

[PAPER] [CODE]

We propose MTRF -- a simple scheme for Multi-Task learning that tackles the challenges of Reset-Free learning . MTRF is effective in learning to solve complex dexterous manipulation tasks in both hardware and simulation without any explicit resets. This work shows the ability to learn dexterous manipulation behaviors in the real world with Reinforcement Learning without any human intervention.

Overview

Reinforcement Learning (RL) algorithms can in principle acquire complex robotic skills by learning from large amounts of data in the real world, collected via trial and error. However, most RL algorithms use a carefully engineered setup in order to collect data, requiring human supervision and intervention to provide episodic resets. This is particularly evident in challenging robotics problems, such as dexterous manipulation. To make data collection scalable, such applications require reset-free algorithms that are able to learn autonomously, without explicit instrumentation or human intervention.

Key insight

Most prior work in this area handles single-task learning. However, we might also want robots that can perform large repertoires of skills. At first, this would appear to only make the problem harder. However, the key observation we make in this work is that an appropriately chosen multi-task RL setting actually alleviates the reset-free learning challenge, with minimal additional machinery required. In effect, solving a multi-task problem can directly solve the reset-free problem since different combinations of tasks can serve to perform resets for other tasks. By learning multiple tasks together and appropriately sequencing them, we can effectively learn all of the tasks together reset-free. This type of multi-task learning can effectively scale reset-free learning schemes to much more complex problems.

Behaviors can be learned reset-free by leveraging multi-task settings

Summary

Video summary


MTRF Poster
Poster

Hardware System

We demonstrate large scale reset free learning in the context of dexterous manipulation. To show this we built a 22 DoF arm + hand system, with a custom designed 16 DoF, 4 fingered robotic hand (D'Hand) mounted on a Sawyer robot arm to allow it to operate in an extended workspace in a table-top setting. We built this hardware to be particularly amenable to our problem setting due it's robustness and ease of long term operation. The D'Hand can operate uninterrupted for upwards of 100 hours in contact rich tasks without any breakages, whereas previous hand based systems are much more fragile. Given the modular nature of the hand, even if a particular joint malfunctions, it is quick to repair and continue training. We operated D'Hand for over 2000 hours during the span of this project (with minimal repairs).

Key Results (acquired without any human intervention)

TASK: In-Hand Manipulation

Training run & final acquired behavior

Task transition graph

TASK: Pipe Insertion

Training run & final acquired behavior

Task transition graph

Additional Results (simulation)

TASK: Bulb Insertion

Training run & final acquired behavior

Task transition graph

TASK: Basketball task

Training run & final acquired behavior

Task transition graph