Talk Date and Time: November 17, 2022 at 04:00 pm - 04:45 pm EST followed by 10 minutes of Q&A on Zoom and IRB-5105.
Topic: Recent Progress on Game-Theoretic Reinforcement Learning for Two-Player Zero-sum Games
Abstract:
In a previous talk, I summarized progress on development of game-theoretic reinforcement learning in two-player zero-sum games. In this talk, I will give an talk about three new algorithms in this category. The first is Simplex NeuPL, a population-based training method that learns how to respond to any convex mixture of policies it generates during training. The second and third are part of a new class of algorithms that can be broadly categorized as iterative regularization refinement: Regularized Nash Dynamics (R-NaD) and Magnetic Mirror Descent (MMD). Magnetic Mirror Descent adds proximal regularization and leads to linear convergence to quantal-response equilibria (QRE). Both methods simplify convergence to biased equilibria, and then reduce the bias over time, converging arbitrarily close to Nash equilibria. All of these methods avoid the often-necessary problem of explicitly computing the average strategy, but do so in different ways. Finally, I will show how R-NaD has proven to work particularly well at scale, leading to DeepNash: an agent that achieves human-level performance in the game of Stratego.
Bio:
Marc Lanctot is a research scientist at DeepMind. His research interests include multiagent reinforcement learning, computational game theory, multiagent systems, and game-tree search. In the past few years, Marc has investigated game-theoretic approaches to multiagent reinforcement learning with applications to fully and partially observable zero-sum games, sequential social dilemmas, and negotiation/communication games. Marc received a Ph.D. degree in artificial intelligence from the Department of Computer Science, University of Alberta in 2013. Currently. Before joining DeepMind, Marc completed a Postdoctoral Research Fellowship at the Department of Knowledge Engineering, Maastricht University, in Maastricht, The Netherlands on Monte Carlo tree search methods in games.