Talk Date and Time: September 15, 2022 at 04:00 pm - 04:45 pm EST followed by 10 minutes of Q&A on Zoom and IRB-5105
Topic: ESCHER: Eschewing Importance Sampling in Games by Computing a History Value Function to Estimate Regret
Abstract:
Recent techniques for approximating Nash equilibria in very large games leverage neural networks to learn approximately optimal policies (strategies). One promising line of research uses neural networks to approximate counterfactual regret minimization (CFR) or its modern variants. DREAM, the only current CFR-based neural method that is model free and therefore scalable to very large games, trains a neural network on an estimated regret target that can have extremely high variance due to an importance sampling term inherited from Monte Carlo CFR (MCCFR). In this paper we propose an unbiased model-free method that does not require any importance sampling. Our method, ESCHER, is principled and is guaranteed to converge to an approximate Nash equilibrium with high probability in the tabular case. We show that the variance of the estimated regret of a tabular version of ESCHER with an oracle value function is significantly lower than that of outcome sampling MCCFR and tabular DREAM with an oracle value function. We then show that a deep learning version of ESCHER outperforms the prior state of the art -- DREAM and neural fictitious self play (NFSP) -- and the difference becomes dramatic as game size increases.
Bio:
Stephen McAleer is a postdoc at Carnegie Mellon University, Pittsburgh (CMU) working on reinforcement learning and game theory with Prof. Tuomas Sandholm. He received a PhD in computer science from the University of California, Irvine in 2021 where he was advised by Pierre Baldi. During his PhD, he did research scientist internships at Intel Labs and DeepMind. Before that, he received a bachelor's degree in mathematics and economics from Arizona State University in 2017.