MARL Summer School

Schedule

Tutorial 1: A brief introduction to online learning in games

Speaker: Panayotis Mertikopoulos

Abstract: This tutorial aims to provide a concise introduction to online learning from a multi-agent, game-theoretic viewpoint. We will start by quickly reviewing some basics of no-regret learning in adversarial multi-armed bandit problems, focusing in particular on the multiplicative/exponential weights algorithm, its variants, and the impact of the information available to the learner. Subsequently, we will explore and seek to characterize the long-run rationality properties of no-regret learning in finite games, i.e., when all players of a finite game follow a regularized learning algorithm (like exponential weights): specifically, we will seek to understand whether the sequence of play converges to equilibrium, under which conditions, at what rate, and what other limiting behaviors may arise in the long run.

Tutorial 2: Single-Agent RL in changing environments

Speaker: Julia Olkhovskaya

Tutorial 3: A Gentle Introduction to Multi-Agent Reinforcement Learning

Speaker: Kaiqing Zhang

Abstract: This tutorial will provide a gentle introduction to multi-agent reinforcement learning (MARL), including both classical and modern results. We will start from reviewing the basics of stochastic games, a canonical framework for MARL, and the algorithms for planning the solution with model knowledge. We will then review several earlier classical MARL algorithms, with oftentimes asymptotic convergence guarantees. We will focus on introducing the recent more “modern” development of MARL, with non-asymptotic guarantees in different RL regimes. Along the way, we will also discuss several problem settings and algorithms new to MARL (theory), followed by some thoughts on MARL from a "Learning-in-Games" perspective.

Research Talk 1: Beyond Equilibrium Learning

Speaker: Chi Jin

Abstract: While classical game theory primarily focuses on finding equilibria, modern machine learning applications introduce a series of new challenges where standard equilibrium notions are no longer sufficient, and the development of new efficient algorithmic solutions is urgently needed. In this talk, we will demonstrate two such scenarios: (1) a natural goal in multiagent learning is to learn rationalizable behavior, which avoids iteratively dominated actions. Unfortunately, such rationalizability is not guaranteed by standard equilibria, especially when approximation errors are present. Our work presents the first line of efficient algorithms for learning rationalizable equilibria with sample complexities that are polynomial in all problem parameters, including the number of players; (2) In multiplayer symmetric constant-sum games like Mahjong or Poker, a natural baseline is to achieve an equal share of the total reward. We demonstrate that the self-play meta-algorithms used by existing state-of-the-art systems can fail to achieve this simple baseline in general symmetric games. We will then discuss the new principled solution concept required to achieve this goal.

Research Talk 2: Recent advances in computing Nash equilibria in Markov Games

Speaker: Ioannis Panageas

Abstract: In this talk, we will focus on the problem of computing Nash equilibria in Markov Games. The seminal work of Daskalakis et al (and later works) indicates that computing Nash equilibria in multi-player Markov games is a computationally hard task. This fact raises the question of whether or not computational intractability can be circumvented if one focuses on specific classes of Markov games. We answer the question in the affirmative by providing computationally efficient methods that approximate Nash equilibria for various classes of Markov games, including Markov potential games, adversarial team Markov games and polymatrix Markov games.

Research Talk 3: Self-Play and RL in the age of LLMs

Speaker: Caglar Gulcehre

Research Talk 4: Learning equilibria in games with bandit feedback

Speaker: Maryam Kamgarpour

Abstract: A significant challenge in managing large-scale engineering systems, such as energy and transportation networks, lies in enabling autonomous decision-making of interacting agents. Game theory offers a framework for modeling and analyzing this class of problems. In many practical applications, each player only has partial information about the cost functions and actions of others. Therefore, a decentralized learning approach is essential to devise optimal strategies for each player.

My talk will focus on recent advances in decentralized learning in static and in Markov games under bandit feedback. It highlights challenges compared to single agent learning and presents algorithms with provable convergence. The first part highlights advances on learning in continuous action static games, convergence conditions and optimal rates. The second part will explore Markov games, presenting our learning approaches for zero-sum Markov games and coarse-correlated equilibria in general-sum Markov games. I will conclude with presenting few open research directions.