Every Policy has Something to Share: Efficient Multi-Task Reinforcement Learning via Selective Behavior Sharing

Abstract: Multi-task reinforcement learning (MTRL) aims to learn multiple tasks simultaneously for greater sample efficiency than learning in isolation. Traditional methods achieve this by sharing parameters among tasks or through cross-task data sharing. Instead, we propose a novel paradigm of cross-task behavioral policy sharing, which can be used in addition to existing MTRL approaches. The key idea is to enhance each task's data collection policy by sharing behaviors from other task policies. Effectively reusing behaviors acquired in one task to collect training data for another task leads to higher-quality trajectories, reducing the need for extensive exploration. Thus, we introduce a simple and generally applicable framework called Q-switch mixture of policies (QMP) that selectively shares behavior between different task policies by using the task's Q-function to evaluate and select useful shareable behaviors. We theoretically analyze how QMP improves the sample efficiency of the underlying RL algorithm. Our experiments show that QMP's behavioral policy sharing provides complementary gains over many popular MTRL algorithms and outperforms alternative ways to share behaviors in various manipulation, locomotion, and navigation environments. Furthermore, temporally extending the selected behavioral policy provides further gains.

Idea: Cross-task sharing of behavioral policies for data collection can accelerate multi-task reinforcement learning algorithms.

Result Videos

Reacher Multistage

QMP (Ours) Results for All 5 Tasks

Task 0

Task 1

Task 2

Task 3

Task 4

Behavior Sharing Baselines

reacher.mp4

Note: UDS goals are colored differently in the visualization but are in the same location as the other baselines.

Maze Navigation

QMP Results for All 10 Tasks

Task 0

Task 5

Task 1

Task 6

Task 2

Task 7

Task 3

Task 8

Task 4

Task 9

Meta-World Manipulation

QMP Results for All 4 Tasks

Door Open

Door Close

Drawer Open

Drawer Close

Behavior Sharing Baselines

mtcds.mp4

Meta-World MT10

QMP Results for All 10 Tasks

Reach

Drawer Close

Push

Button Press

Pick Place

Peg Insert Side

Door Open

Window Close

Drawer Open

Window Open

Quantitative Results

Behavioral policy sharing is complementary to no-sharing, parameter-sharing, and data-sharing!

QMP is the most reliable choice to share behaviors as compared to alternative behavior-sharing ideas.