Offline RL on Diverse Multi-Task Data Both Scales and Generalizes
Aviral Kumar, Rishabh Agarwal, Xinyang (Young) Geng, George Tucker*, Sergey Levine*
Google Research & UC Berkeley
https://arxiv.org/abs/2211.15144
{aviralk, young.geng, svlevine}@eecs.berkeley.edu, {rishabhagarwal, gjt}@google.com
Can we train large models via offline RL on large datasets?
Offline RL research has largely centered around small-scale, single-task problems where broad generalization and learning general-purpose representations is not expected. In this work, we make the first attempts to scale offline Q-learning using diverse datasets.
Approach:
Scaled Q-Learning
Scaled Q-Learning
Key Empirical Results
Scaled Q-learning exhibits a favorable scaling curve: Performance of scaled Q-learning increases as the model size increases (both with and without C51)
Scaled Q-learning outperforms prior methods based on return-conditioned supervised learning: The gains are largest on sub-optimal data, where Q-learning works better due to stitching.
Representations learned by scaled Q-learning generalize to unseen games and modes of training games, exhibiting good performance compared to offline and online fine-tuning. State-of-the-art representation learning methods, such as masked auto-encoders do not perform well, and control centric-representations learned by scaled QL perform better.