Note: Presentation titles in parentheses provide tentative descriptions of upcoming talks, and will be replaced with actual presentation titles when available. Titles for past talks link to more information about the talk and speaker, as well as a recording and/or slide deck when available.
Date Speaker Presentation Title
August 22 Maryam Kamgarpour Learning equilibria in games with bandit feedback
_______________________________________________________________________________________________________________________________
September 5 Guannan Qu Locally Interdependent Multi-Agent MDP: A Scalable
Approach to Partially Observable Locally Dependent
Systems
_______________________________________________________________________________________________________________________________
September 12 Jorge Poveda Deception and Incentives in Multi-Agent Learning: A Control-Theoretic View
_______________________________________________________________________________________________________________________________
September 19 Florian Dörfler LQR Learning Pipelines
_______________________________________________________________________________________________________________________________
September 26 Rasoul Etesami Learning Stationary Nash Equilibrium Policies in n-Player Stochastic Games with Independent Chains
_______________________________________________________________________________________________________________________________
October 3 Eric Mazumdar Principled Multi-agent Learning
_______________________________________________________________________________________________________________________________
October 10 Thinh Doan Multi-Time-Scale Stochastic Approximation as a Tool for Multi-Agent Learning and Distributed Optimization
_______________________________________________________________________________________________________________________________
October 31 Sophie Hall Receding Horizon Games: System Theory, Learning Equilibria and Applications
_______________________________________________________________________________________________________________________________
November 14 Shangding Gu Toward Safe and Scalable Reinforcement Learning in Robotic Systems and Beyond
_______________________________________________________________________________________________________________________________
November 21 Niao He Scaling Multi-Agent Reinforcement Learning to the Mean-Field Regime
_______________________________________________________________________________________________________________________________
December 5 Rahul Mangharam,
Hongrui Zheng and
Nandan Tumu MAD Games: Multi-Agent Dynamic Games with Collaborative Teams in AdversarialCompetitions
Maryam Kamgarpour (EPFL, Switzerland)
Title: Learning equilibria in games with bandit feedback
Abstract: A significant challenge in managing large-scale engineering systems, such as energy and transportation networks, lies in enabling autonomous decision-making of interacting agents. Game theory offers a framework for modeling and analyzing this class of problems. In many practical applications, each player only has partial information about the cost functions and actions of others. Therefore, a decentralized learning approach is essential to devise optimal strategies for each player. My talk will focus on recent advances in decentralized learning in static and in Markov games under bandit feedback. It highlights challenges compared to single agent learning and presents algorithms with provable convergence. The first part will focus on learning in continuous action static games. The second part will explore Markov games, presenting our learning approaches for zero-sum Markov games and coarse-correlated equilibria in general-sum Markov games. I will conclude with presenting few open research directions.
Bio: Maryam Kamgarpour is a professor in the School of Engineering of École Polytechnique Fédérale de Lausanne. Prior to joining EPFL, she served as a faculty at the University of British Columbia and at ETH Zürich. Her research focuses on advancing fundamental understanding of multi-agent decision-making in uncertain and dynamic environments. Towards this goal, she develops algorithms for safe stochastic control and reinforcement learning, and game theory and mechanism design. Her theoretical research is driven by control challenges arising in intelligent transportation networks, robotics and power grid systems. She holds a Doctor of Philosophy in Engineering from the University of California, Berkeley and a Bachelor of Applied Science from University of Waterloo, Canada. She has received the NASA High Potential Individual Award and the NASA Excellence in Publication Award for her work on air traffic control with NASA Ames Research Center, and the European Union Starting Grant for her work on multi-agent decision making and distributed control for power systems. She is the recipient of the 2022 IEEE Transactions on Control of Network Systems Outstanding Paper Award and the 2024 European Control Award. She is a fellow of ELLIS and an associate editor of IEEE Transactions on Automatic Control.
Video Link: https://www.youtube.com/watch?v=WsN-Q4C_kXk
Guannan Qu (CMU, United States)
Title: Locally Interdependent Multi-Agent MDP: A Scalable Approach to Partially Observable Locally Dependent Systems
Abstract: Partially observable multi-agent MDP problems are challenging to solve both because of the exponential growth in the state space and partial observability. These problems are often modelled using a Decentralized Partially Observable Markov Decision Process (Dec-POMDP) which is NEXP-Complete and generally seen as intractable to solve. However, for many important partially observable problems such as cooperative navigation, obstacle avoidance, and formation control with applications in autonomous driving, UAVs, robot navigation etc., they satisfy a locality assumption of the interactions such as local collisions or local formation constructions.
In this presentation, we will introduce a new framework, Locally Interdependent Multi-Agent MDP (LIMDP), that leverages the above locality assumptions. Importantly, we show that LIMDP will have solutions that are more tractable to compute than general Dec-POMDP and can even counteract the curse-of-dimensionality. Further, these more tractable partially observable solutions are theoretically provable to be exponentially close to the fully observable joint optimal policy with respect to the visibility. This can be matched with a lower bound to establish near-optimality. We will present various simulations, and in particular demonstrate the scalability of these methods with a simulation of 100 agents navigating in a grid world without collision that was computed without the use of any empirical methods.
Bio: Guannan Qu has been an Assistant Professor at the Electrical and Computer Engineering Department of Carnegie Mellon University since September 2021. He received his B.S. degree in Electrical Engineering from Tsinghua University in Beijing, China in 2014, and his Ph.D. in Applied Mathematics from Harvard University in Cambridge, MA in 2019. He was a CMI and Resnick postdoctoral scholar in the Department of Computing and Mathematical Sciences at California Institute of Technology from 2019 to 2021. He is the recipient of NSF CAREER Award, Finalist of ICRA 2025 Best Conference Paper Award, Caltech Simoudis Discovery Award, PIMCO Fellowship, Amazon AI4Science Fellowship, and IEEE SmartGridComm Best Student Paper Reward. His research interest lies in control, optimization, and machine/reinforcement learning with applications to power systems, multi-agent systems, Internet of things, smart city, etc.
Video Link: https://www.youtube.com/watch?v=K1yEL3VxlfE
Jorge Poveda (UCSD)
Title: Deception and Incentives in Multi-Agent Learning: A Control-Theoretic View
Abstract: We study the problem of real-time learning of Nash equilibria in multi-agent systems that model non-cooperative games. In this setting, most adaptive algorithms rely on exploration–exploitation mechanisms that enable agents to adjust their actions using only environmental feedback, operating in a fully model-free and communication-free manner. Under symmetric information assumptions and across a broad class of games, these algorithms consistently converge to a Nash equilibrium, leaving players with no incentive to deviate from their steady-state strategies. When this symmetry is broken, however, agents with privileged knowledge of others’ policies or parameters can manipulate the learning process by adjusting their own exploration mechanisms. In doing so, they preserve their own learning outcomes while causing oblivious agents to systematically develop false beliefs. We show that these beliefs can be indirectly and dynamically controlled in real time by the privileged agents, effectively steering the system toward a different steady state—one corresponding to the Nash equilibrium of an alternative "deceptive game". Although such deception is typically designed to increase the payoff of the deceptive players, we show that in a broad class of games, it can also give rise to "benevolent deception", leading the system to converge to operating points that are more efficient than standard equilibria. In this setting, benevolent deception compels oblivious agents to adopt actions that improve both their own performance and the collective efficiency of the system. Finally, we characterize the stability and convergence properties of the "deceptive dynamics" using control-theoretic tools.
Bio: Jorge I. Poveda received his M.S. and Ph.D. degrees in Electrical and Computer Engineering from UC Santa Barbara, in 2016 and 2018, respectively. He also worked as a Research Intern at the Mitsubishi Electric Research Laboratories in 2016 and 2017. Following this, he was a Postdoctoral Fellow at Harvard University and an Assistant Professor at the University of Colorado, Boulder. Since 2022, he has been with UC San Diego, where he is an Associate Professor in the ECE Department and Associate Director of the Center for Control Systems and Dynamics. He is the recipient of the CRII and CAREER awards from NSF, the Young Investigator awards from AFOSR and SHPE, the 2023 Donald P. Eckman Award from AACC, the Best Paper Award from the IEEE Transactions on Control of Network Systems, and the CCDC Outstanding Scholar Fellowship award and Best Ph.D. Dissertation award from UC Santa Barbara. Furthermore, he is an advisor for students selected as finalists for the Best Student Paper award at the 2024 American Control Conference, and as winners of the Young Author Award at the 2024 IFAC Conference on Analysis and Design of Hybrid Systems. He was also a finalist for the Best Student Paper Award at the IEEE Conference on Decision and Control in 2017 (as student) and 2021 (as co-author). He serves as Associate Editor for Automatica, Nonlinear Analysis: Hybrid Systems, and IEEE Control Systems Letters.
Video Link: https://www.youtube.com/watch?v=F2lN_TEA1hI
Florian Dörfler (ETH Zurich)
Title: LQR Learning Pipelines
Abstract: The linear quadratic regulator (LQR) problem is a cornerstone of automatic control, and it has been widely studied in the data-driven setting. In the first part of the talk, we show how to bridge different problem formulations and propose a novel, direct, and regularized version of the LQR. We start from indirect certainty-equivalence LQR, i.e., least-square identification of state-space matrices followed by a nominal model-based design, formalized as a bi-level program. We show how to transform this problem into a single-level, regularized, and direct data-driven control formulation, where different regularizers account for least-square data fitting or robustifying variance reduction. For this novel formulation we carry out a robustness and performance analysis in presence of noise. In the second part of the talk, we propose an adaptive method to learn this solution online. By adaptive, we mean an online method using closed-loop data, in a non-episodic fashion, and with recursive algorithmic implementation. Our approach is based on a covariance parameterization of the direct, data-driven, and regularized LQR and an explicit calculation of the policy gradient using a batch of persistently exciting data. We establish the global convergence of our method via a projected gradient dominance property in presence of bounded noise. Finally, all our theoretical results are validated with simulations and experiments in different domains and demonstrate the computational and sample efficiency of our method.
Bio: Florian Dörfler is a Professor at the Automatic Control Laboratory at ETH Zürich. He received his Ph.D. degree in Mechanical Engineering from the University of California at Santa Barbara in 2013, and a Diplom degree in Engineering Cybernetics from the University of Stuttgart in 2008. From 2013 to 2014 he was an Assistant Professor at the University of California Los Angeles. He has been serving as the Associate Head of the ETH Zürich Department of Information Technology and Electrical Engineering from 2021 until 2022. His research interests are centered around automatic control, system theory, optimization, and learning. His particular foci are on network systems, data-driven settings, and applications to power systems. He is a recipient of the distinguished young research awards by IFAC (Manfred Thoma Medal 2020) and EUCA (European Control Award 2020) as well as the 2025 Rössler Prize, the highest scientific award across all disciplines at ETH Zürich. He and his team received best paper distinctions in the top venues of control, machine learning, power systems, power electronics, circuits and systems. In particular, they were recipients of the 2011 O. Hugo Schuck Best Paper Award, the 2012-2014 Automatica Best Paper Award, the 2016 IEEE Circuits and Systems Guillemin-Cauer Best Paper Award, the 2022 IEEE Transactions on Power Electronics Prize Paper Award, the 2024 Control Systems Magazine Outstanding Paper Award, and multiple Best PhD thesis awards at UC Santa Barbara and ETH Zürich. They were further winners or finalists for Best Student Paper awards at the European Control Conference (2013, 2019), the American Control Conference (2010,2016,2024), the Conference on Decision and Control (2020), the PES General Meeting (2020), the PES PowerTech Conference (2017), the International Conference on Intelligent Transportation Systems (2021), the IEEE CSS Swiss Chapter Young Author Best Journal Paper Award (2022,2024,2025), the IFAC Conferences on Nonlinear Model Predictive Control (2024) and Cyber-Physical-Human Systems (2024), and NeurIPS Oral (2024). He is currently serving on the council of the European Control Association and as a senior editor of Automatica.
Video Link: https://www.youtube.com/watch?v=A6r21xKzVv0
Rasoul Etesami (UIUC)
Title: Learning Stationary Nash Equilibrium Policies in n-Player Stochastic Games with Independent Chains
Abstract: Motivated by applications such as bandwidth allocation in wireless communication networks and energy management in smart grids, we consider a subclass of n-player stochastic games, in which players have their own internal state spaces while they are coupled through their payoff functions. It is assumed that players' internal chains are driven by independent transition probabilities. Moreover, players receive only realizations of their payoffs and cannot observe each other's states or actions. For this class of stochastic games, we first show that finding a stationary Nash equilibrium (NE) policy is intractable without additional assumptions on the reward functions. Nevertheless, for general reward functions, we develop polynomial-time learning algorithms based on dual averaging that converge to the set of ε-NE policies in terms of the averaged Nikaido-Isoda distance, either almost surely or with high probability. We further extend our results to the settings with unknown transition probabilities. In particular, under additional assumptions on the reward functions or the equilibrium structure, we show that our algorithms indeed converge to an ε-NE policy with high probability in polynomial time. Finally, we demonstrate the effectiveness of the proposed algorithms through numerical experiments on energy management in smart grids.
Bio: Rasoul Etesami is an Associate Professor in the Department of Industrial and Systems Engineering at the University of Illinois Urbana-Champaign (UIUC), where he is also affiliated with the Department of Electrical and Computer Engineering and the Coordinated Science Laboratory. Prior to joining the faculty at UIUC, he was a Postdoctoral Research Fellow in the Department of Electrical Engineering at Princeton University and WINLAB. He received his Ph.D. in Electrical and Computer Engineering from UIUC in 2015. His research interests include the analysis of complex socioeconomic and decision-making systems using tools from control theory, game theory, optimization, and learning theory. He is the recipient of the Best CSL Ph.D. Thesis Award at UIUC in 2016, the Springer Outstanding Ph.D. Thesis Award in 2017, the NSF CAREER Award in 2020, the US Air Force Young Investigator Award in 2023, the SIAM Journal on Control and Optimization Best Paper Award in 2025, and multiple best teaching and reviewer awards. He has served as General Chair for conferences such as the Annual Allerton Conference and the C3.ai DTI Workshop, and as an Associate Editor for journals including IET Smart Grid and Dynamic Games and Applications
Video Link: https://www.youtube.com/watch?v=B1nG-Sol0fg
Eric Mazumdar (Caltech)
Title: Principled Algorithm Design for Multi-Agent Learning
Abstract: Machine learning algorithms are increasingly being deployed into environments in which they must interact with other strategic agents like algorithms and people with potentially misaligned objectives. While the presence of these strategic interactions creates new challenges for learning algorithms, they also give rise to new opportunities for algorithm design. In this talk I will discuss how ideas from economics can give us new insights into the analysis and design of machine learning algorithms for these real-world environments.
First I will discuss some work on understanding how to use function approximation in multi-agent reinforcement learning. Our work challenges the idea that more expressive models, more data, and more compute always improves performance in such settings. Instead we show a Braess paradox-like phenomenon that---even when one has access to infinite data---strategic interactions can make smaller and less expressive model classes yield better equilibrium outcomes.
I then will discuss a line of work on using models of human decision-making from behavioral economics in multi-agent reinforcement learning. In particular, by introducing a form of strategic risk-aversion (as well as bounded rationality) into games, we derive a class of game theoretic equilibria that can be efficiently computed in all finite-horizon Markov games. This allows us to develop algorithms with strong guarantees of convergence for any multi-agent system. Furthermore, we show that they capture human-play in a variety of game theoretic experiments conducted in behavioral economics. I will conclude by highlighting how these ideas allow us to derive a new stationary solution concept for infinite horizon Markov Games that can be efficiently computed, setting the stage for new classes of algorithms for robotics and multi-agent learning more broadly.
Bio: Eric Mazumdar is an Assistant Professor in Computing and Mathematical Sciences and Economics at Caltech. His research lies at the intersection of machine learning and economics where he is broadly interested in developing the tools and understanding necessary to confidently deploy machine learning algorithms into societal-scale systems. Eric is the recipient of a NSF Career Award and was a fellow at the Simons Institute for Theoretical Computer Science for the semester on Learning in Games. He obtained his Ph.D in Electrical Engineering and Computer Science at UC Berkeley where he was advised by Michael Jordan and Shankar Sastry and received his B.S. in Electrical Engineering and Computer Science at the Massachusetts Institute of Technology (MIT).
Video Link: Coming Soon
Thinh Doan (UT Austin)
Title: Multi-Time-Scale Stochastic Approximation as a Tool for Multi-Agent Learning and Distributed Optimization
Abstract: Multi-time-scale stochastic approximation (SA) is a powerful generalization of the classic SA method for finding roots (or fixed points) of coupled nonlinear operators. It has attracted considerable attention due to its broad applications in multi-agent learning, control, and optimization. In this framework, multiple iterates are updated simultaneously but with different step sizes, whose ratios loosely define their time-scale separation. Empirical studies and theoretical insights have shown that such heterogeneous step sizes can lead to improved performance compared to single-time-scale (or classical) SA schemes. However, despite these advantages, existing results indicate that multi-time-scale SA typically achieves only a suboptimal convergence rate, slower than the optimal rate attainable by its single-time-scale counterpart. In this talk, I will present our recent work on characterizing the convergence complexity of multitime-scale SA. We develop a novel variant of this method and establish new finite-sample guarantees that achieves the optimal (O(1/k)) convergence rate. Building upon these results, I will also discuss how these advances enable the design of efficient algorithms for key problems in multi-agent learning and distributed optimization over networks.
Bio. Thinh T. Doan is an Assistant Professor in the Department of Aerospace Engineering and Engineering Mechanics at UT Austin. He got his undergraduate at Hanoi University of Science and Technology in 2008, M.S. at the University of Oklahoma, and Ph.D. at UIUC, all in Electrical and Computer Engineering. At Illinois, he was the recipient of the Harriett and Robert Perry Fellowship Award in two years, 2016 and 2017. From 2018-2020, he was a TRIAD postdoctoral fellow at the Georgia Institute of Technology. Before joining UT, he was an Assistant Professor in the ECE Department at Virginia Tech. He received the AFOSR YIP and NSF CAREER Awards in 2024.
Video Link: https://www.youtube.com/watch?v=8ZALX4hKieE
Sophie Hall (ETH Zurich)
Title: Receding Horizon Games: System Theory, Learning Equilibria and Applications
Abstract: Game-theoretic MPC (or Receding Horizon Games) is an emerging control methodology for multi-agent systems that generates control actions by solving a dynamic game with coupling constraints in a receding-horizon fashion. This control paradigm has recently received increasing attention in various application fields, including robotics, autonomous driving, traffic networks, and energy grids, due to its ability to model the competitive nature of self-interested agents with shared resources while incorporating future predictions, dynamic models, and constraints into the decision-making process.
In this talk, we will: (i) Motivate Receding Horizon Games through dynamic resource allocation problems (energy, water, traffic, etc.); (ii) Place it in the context of advanced MPC methods and explain why we need a game-theoretic perspective; (iii) Present the control framework and its solution concept based on Generalized Nash Equilibria; (iv) Demonstrate how competitive agents learn to play stead-state GNEs based on dissipativity and turnpike theory; (iv) Present application results in energy management.
Bio. Sophie Hall is a PhD student at the Automatic Control Laboratory at ETH Zürich since May 2021, working in Prof. Dörfler's group. She completed her undergraduate studies in Mechanical Engineering focusing on medical control and signal processing at the University of Surrey, UK, and Nanyang Technological University, Singapore. In 2021, she obtained MSc in Biomedical Engineering from ETH Zürich specializing in modeling and control. During her master's studies, she conducted research on Gaussian processes for control in Prof. Zeilinger's group and worked on real-time MPC schemes in Prof. Dörfler's group. Her PhD research focuses on game-theoretic MPC, its theoretical closed-loop properties, as well as energy and groundwater applications.
Video Link: https://www.youtube.com/watch?v=K6v5EF9TTx4
Shangding Gu (UCB)
Title: Toward Safe and Scalable Reinforcement Learning in Robotic Systems and Beyond
Abstract: As artificial intelligence continues to evolve, reinforcement learning has emerged as a powerful tool for training agents to make decisions in complex environments. However, ensuring safety during the learning and deployment phases remains a critical challenge, especially in high-stakes domains like autonomous driving and robotics. This talk will introduce the fundamental concepts behind safe reinforcement learning, discuss why safety is vital in real-world applications, and highlight key methodologies aimed at ensuring safety during the learning process.
Bio. Shangding Gu is a postdoctoral researcher in the Department of Electrical Engineering and Computer Sciences at the University of California, Berkeley. He received his Ph.D. in Computer Science from the Technical University of Munich in 2024. His research has been supported by multiple grants and scholarships from organizations including Nvidia, OpenAI, and Bosch. He was the founding chair of the 1st International Safe Reinforcement Learning Workshop at IEEE MFI 2022 and has organized several international workshops. He also serves as a lead guest editor for IEEE Transactions on Automation Science and Engineering. His research interests lie at the intersection of reinforcement learning, robotics, and large language models. More info can be found at: https://people.eecs.berkeley.edu/~shangding.gu/
Niao He (ETH Zurich)
Title: Scaling Multi-Agent Reinforcement Learning to the Mean-Field Regime
Abstract: Reinforcement Learning (RL) has achieved remarkable success especially when combined with deep learning; however, scaling RL beyond the single-agent setting remains a major challenge. In particular, the “curse of many agents” hinders the application of RL to systems with thousands or even millions of interacting participants. Such large-scale problems arise naturally in domains like financial markets, auctions, traffic/resource management, and social systems, where optimal decision-making and computation quickly become intractable. We explore mean-field reinforcement learning (MF-RL) as a principled framework to address this challenge under the agent exchangeability assumption. Our work extends the theoretical foundations of MF-RL with an emphasis on computational aspects and real-world applicability. Specifically, we analyze mean-field approximation properties, study communication and coordination bottlenecks during learning, and examine the computational and statistical complexity of scaling RL to the mean-field regime. Finally, we highlight applications to large-scale incentive design and resource allocation, demonstrating how MF-RL can serve as a bridge between mean-field theory and practical multi-agent RL algorithms.
Bio: Niao He is an Associate Professor in the Department of Computer Science at ETH Zurich, where she leads the Optimization and Decision Intelligence (ODI) Group. She is the co-director of ETH-Max Planck Center of Learning Systems, and a core faculty member of ETH AI Center. Previously, she was an assistant professor at the University of Illinois at Urbana-Champaign from 2016 to 2020. Before that, she received her Ph.D. degree in Operations Research from Georgia Institute of Technology in 2015. Her research interests lie in large-scale optimization and reinforcement learning, with a primary focus on theoretical and algorithmic foundations for principled, scalable, and trustworthy decision intelligence. She is a recipient of AISTATS Best Paper Award, NSF CRII Award, SNSF Starting Grant, etc. She serves as the associate editor for SIAM Journal on Optimization and Journal of Optimization Theory and Applications, and regularly as an (senior) area chair for NeurIPS, ICLR, ICML and other machine learning conferences.
Video Link: https://www.youtube.com/watch?v=lARqCicJdZ8
Rahul Mangharam, Hongrui Zheng and Nandan Tumu from UPenn xLAB
Title: MAD Games: Multi-agent Dynamic Games in Physical World
Abstract: This talk presents 3 research topics on multi-agent games from xLAB, U Pennsylvania.
Distributionally Robust Online Adaptation via Offline Population Synthesis
Rahul Mangharam: The critical challenge in deploying autonomous systems is achieving peak performance without compromising safety. Autonomous racing crystallizes this challenge, as it punishes conservative policies and demands robust, adaptive strategies in multi-agent settings. Current approaches often fail by either oversimplifying the behavior of other agents or lacking mechanisms for real-time adaptation. We present a robust adaptive approach to balance safety and performance in dynamic multi-agent games.
Signal Temporal Logic Games in Adversarial Multi-Agent Systems
Hongrui (Billy) Zheng: We introduce a framework that synthesizes robust policies for autonomous agents under Signal Temporal Logic (STL) tasks in adversarial settings. Using fictitious self-play with a gradient-based method for differentiable STL formulas, STLGame converges to nearly unexploitable policies.
The Social Influence Game: Modeling opinion dynamics under external influence
Nandan Tumu: The modern social network is a battleground where a multitude of external actors seek to exert their influence. The Social Influence Game is a setting to model this many-player game, where external actors seek to sway the opinion of the social network through strategic allocation of a limited influence budget. In our initial work, we identify the class of problems that computing a best response belongs to under DeGroot dynamics, and present a computationally efficient approximation of the same.
Video Link: https://youtu.be/66i2sgj1Ee4