Speaker: Uri Sherman (Tel Aviv University)
Title: Convergence and Sample Complexity of First-Order Methods for Agnostic Reinforcement Learning
The recording will be uploaded here after the event.
Authors: Uri Sherman, Tomer Koren, Yishay Mansour
Abstract: It is well known that efficient policy learning is possible in the function approximation setup subject to policy class completeness and suitable coverage conditions, in particular, without making any additional assumptions on the structure of the environment. In this talk, I will discuss our recent results that demonstrate efficient policy learning is possible under a weaker assumption originating in optimization literature, namely, variational gradient dominance (VGD) of the value function. I will present a general policy learning framework that reduces the reinforcement learning problem to first-order optimization in non-Euclidean space, and demonstrate how this leads to new algorithms as well as sheds new light on the convergence properties of existing ones. Specifically, we will discuss a novel Steepest Descent Policy Optimization method, the well-known Policy Mirror Descent algorithm, and the classical Conservative Policy Iteration (Kakade and Langford, 2002) algorithm, which we revisit through the lens of the Frank Wolfe method. Time permitting, we will conclude with a discussion of the practical relevance of the VGD condition and the algorithms considered.
Speaker Bio: Uri is a fifth-year PhD student at Tel-Aviv University, advised by Yishay Mansour and Tomer Koren. Prior to that, Uri spent a few years in various engineering and management positions in the private sector, and before that obtained his B.Sc. from Tel Aviv University and M.Sc. from the Weizmann Institute of Science, where he worked with Prof. Uriel Feige. Uri's research interests are in stochastic optimization and reinforcement learning theory.