seminar series
Upcoming Talk
Reinforcement Learning for Large-Scale Games
This talk addresses the use of reinforcement learning in two-player zero-sum Markov games with finite but large state spaces, for which the goal is to find minimax policies with “modest”' computation. We use the qualifier “modest” to mean that we seek to certify policies as optimal without exploring the full state-space of the game.
The approach followed is strongly motivated by Q-learning, which was proposed in the late 1980s to extend the single-player dynamic programming principle to model-free reinforcement learning by eliminating the need for a known transition model. Extensions of Q-learning to two-player zero-sum games appeared shortly after. Since then, most of the work devoted to proving correctness of Q-learning relies on establishing that its iteration converges to a unique fixed-point of a Bellman-like equation, which generally requires exploring the full state-space.
We will see that, for zero-sum games, it is possible to construct provably correct optimal policies using algorithms inspired by Q-learning, without requiring convergence of the Q function over the whole state-space. In fact, the samples used to update the Q-function may not even explore the whole set of reachable states and, for certain classes of games, the fraction of explored states gets smaller and smaller as the size of the state-space increases.