Offline RL on Diverse Multi-Task Data Both Scales and Generalizes

Aviral Kumar, Rishabh Agarwal, Xinyang (Young) Geng, George Tucker*, Sergey Levine*

Google Research & UC Berkeley

https://arxiv.org/abs/2211.15144

{aviralk, young.geng, svlevine}@eecs.berkeley.edu, {rishabhagarwal, gjt}@google.com

Can we train large models via offline RL on large datasets?

Offline RL research has largely centered around small-scale, single-task problems where broad generalization and learning general-purpose representations is not expected. In this work, we make the first attempts to scale offline Q-learning using diverse datasets.

Approach:
Scaled Q-Learning

Key Empirical Results

Scaled Q-learning exhibits a favorable scaling curve: Performance of scaled Q-learning increases as the model size increases (both with and without C51)

Scaled Q-learning outperforms prior methods based on return-conditioned supervised learning: The gains are largest on sub-optimal data, where Q-learning works better due to stitching.

Representations learned by scaled Q-learning generalize to unseen games and modes of training games, exhibiting good performance compared to offline and online fine-tuning. State-of-the-art representation learning methods, such as masked auto-encoders do not perform well, and control centric-representations learned by scaled QL perform better.

2211.15144.pdf