Every Policy has Something to Share: Efficient Multi-Task Reinforcement Learning via Selective Behavior Sharing
Abstract: Multi-task reinforcement learning (MTRL) aims to learn multiple tasks simultaneously for greater sample efficiency than learning in isolation. Traditional methods achieve this by sharing parameters among tasks or through cross-task data sharing. Instead, we propose a novel paradigm of cross-task behavioral policy sharing, which can be used in addition to existing MTRL approaches. The key idea is to enhance each task's data collection policy by sharing behaviors from other task policies. Effectively reusing behaviors acquired in one task to collect training data for another task leads to higher-quality trajectories, reducing the need for extensive exploration. Thus, we introduce a simple and generally applicable framework called Q-switch mixture of policies (QMP) that selectively shares behavior between different task policies by using the task's Q-function to evaluate and select useful shareable behaviors. We theoretically analyze how QMP improves the sample efficiency of the underlying RL algorithm. Our experiments show that QMP's behavioral policy sharing provides complementary gains over many popular MTRL algorithms and outperforms alternative ways to share behaviors in various manipulation, locomotion, and navigation environments. Furthermore, temporally extending the selected behavioral policy provides further gains.
Idea: Cross-task sharing of behavioral policies for data collection can accelerate multi-task reinforcement learning algorithms.
Result Videos
Reacher Multistage
QMP (Ours) Results for All 5 Tasks
Task 0
Task 1
Task 2
Task 3
Task 4
Behavior Sharing Baselines
![](https://www.google.com/images/icons/product/drive-32.png)
Note: UDS goals are colored differently in the visualization but are in the same location as the other baselines.
Maze Navigation
QMP Results for All 10 Tasks
Task 0
Task 5
Task 1
Task 6
Task 2
Task 7
Task 3
Task 8
Task 4
Task 9
Meta-World Manipulation
QMP Results for All 4 Tasks
Door Open
Door Close
Drawer Open
Drawer Close
Behavior Sharing Baselines
![](https://www.google.com/images/icons/product/drive-32.png)
Meta-World MT10
QMP Results for All 10 Tasks
Reach
Drawer Close
Push
Button Press
Pick Place
Peg Insert Side
Door Open
Window Close
Drawer Open
Window Open
Quantitative Results