NeurIPS 2020 Tutorial
Offline Reinforcement Learning:
From Algorithms to Practical Challenges

Aviral Kumar Sergey Levine
UC Berkeley

Abstract

Reinforcement learning (RL) provides a mathematical formalism for learning-based control that allows for acquisition of near-optimal behaviors by optimizing user-specified reward functions. While RL methods have received considerable attention recently due to impressive applications in many areas, the fact that RL requires a fundamentally online learning paradigm is one of the biggest obstacles to its widespread adoption. Online interaction is often impractical, because data collection is expensive (e.g., in robotics, or educational agents) or dangerous (e.g., in autonomous driving, or healthcare). An alternate approach is to utilize RL algorithms that effectively leverage previously collected experience without requiring online interaction. This has been referred to as batch RL, offline RL, or data-driven RL. Such algorithms hold tremendous promise for making it possible to turn datasets into powerful decision making engines, similarly to how datasets have proven key to the success of supervised learning in vision and NLP. In this tutorial, we aim to provide the audience with the conceptual tools needed to both utilize offline RL as a tool, and to conduct research in this exciting area. We aim to provide an understanding of the challenges in offline RL, particularly in the context of modern deep RL methods, and describe some potential solutions that have been explored in recent work, along with applications. We will present classic and recent methods in a way that is accessible for practitioners, and also discuss the theoretical foundations for conducting research in this field. We will conclude with a discussion of open problems.

Presenter Bios

Sergey Levine (UC Berkeley/Google Research) is an assistant professor at UC Berkeley and a research scientist at Google. His work focuses on machine learning for decision making and control, with an emphasis on deep learning and reinforcement learning algorithms. His prior work includes some of the most widely used deep reinforcement learning algorithms, such as TRPO and SAC, as well a number of recent offline reinforcement learning algorithms.

Aviral Kumar (UC Berkeley) is a third-year Ph.D. student in Computer Science advised by Sergey Levine. His research focuses on offline reinforcement learning and understanding and addressing the challenges in deep reinforcement learning, with the goal of making RL a general-purpose, widely applicable, scalable and reliable paradigm for autonomous decision making.

References

(We apologize if we missed out on some references! We are happy to add new references, please send out an email to aviralk@berkeley.edu.)

offline_rl_tutorial_references.pdf