Every Policy has Something to Share: Efficient Multi-Task Reinforcement Learning via Selective Behavior Sharing

Abstract: Multi-task reinforcement learning (MTRL) aims to learn multiple tasks simultaneously for greater sample efficiency than learning in isolation. Traditional methods achieve this by sharing parameters among tasks or through cross-task data sharing. Instead, we propose a novel paradigm of cross-task behavioral policy sharing, which can be used in addition to existing MTRL approaches. The key idea is to enhance each task's data collection policy by sharing behaviors from other task policies. Effectively reusing behaviors acquired in one task to collect training data for another task leads to higher-quality trajectories, reducing the need for extensive exploration. Thus, we introduce a simple and generally applicable framework called Q-switch mixture of policies (QMP) that selectively shares behavior between different task policies by using the task's Q-function to evaluate and select useful shareable behaviors. We theoretically analyze how QMP improves the sample efficiency of the underlying RL algorithm. Our experiments show that QMP's behavioral policy sharing provides complementary gains over many popular MTRL algorithms and outperforms alternative ways to share behaviors in various manipulation, locomotion, and navigation environments. Furthermore, temporally extending the selected behavioral policy provides further gains.

Idea: Cross-task sharing of behavioral policies for data collection can accelerate multi-task reinforcement learning algorithms.