Upcoming & Past Seminars

Past Speakers

(Recordings available on our YouTube channel)

Kalesha Bullard

(DeepMind)

Date: December 16, 2021

Title: Multi-Agent Reinforcement Learning towards Zero-Shot Emergent Communication

Abstract: Effective communication is an important skill for enabling information exchange and cooperation in multi-agent settings, in which AI agents coexist in shared environments with other agents (artificial or human). Indeed, emergent communication is now a vibrant field of research, with common settings involving discrete cheap-talk channels. One limitation of this setting however is that it does not allow for the emergent protocols to generalise beyond the training partners. Furthermore, the typical problem setting of discrete cheap-talk channels may be less appropriate for embodied agents that communicate implicitly through physical action. This talk presents research that investigates methods for enabling AI agents to learn general communication skills through interaction with other artificial agents. In particular, the talk will focus on my recent work within Multi-Agent Reinforcement Learning, investigating emergent communication protocols, inspired by communication in more realistic settings. We present a novel problem setting and a general approach that allows for zero-shot communication (ZSC), i.e., emergence of communication protocols that can generalise to independently trained agents. We also explore and analyse specific difficulties associated with finding globally optimal ZSC protocols, as complexity of the communication task increases or the modality for communication changes (e.g. from symbolic communication to implicit communication through physical movement, by an embodied artificial agent). Overall, this work opens up exciting avenues for learning general communication protocols in more complex domains.

Amy Greenwald

(Brown University)

Date: December 2, 2021

Title: Learning Equilibria via Regret Minimization in Extensive-Form Games

Abstract: The convergence of \Phi-regret-minimization algorithms in self-play to \Phi-equilibria is well understood in normal-form games (NFGs), where \Phi is the set of deviation strategies. This talk investigates the analogous relationship in extensive-form games (EFGs). While the primary choices for \Phi in NFGs are internal and external regret, leading to convergence in self play to correlated and coarse correlated equilibria, respectively, the space of possible deviations in EFGs is much richer. We restrict attention to a class of deviations known as behavioral deviations, inspired by von Stengel and Forges' deviation player, which they introduced when defining extensive-form correlated equilibria (EFCE). We then propose extensive-form regret minimization (EFR), a regret-minimizing learning algorithm whose complexity scales with the complexity of \Phi, and which converges in self-play to EFCE when \Phi is the set of behavioral deviations. Von Stengel and Forges, Zinkevich et al., and Celli et al. all weaken the deviation player in various ways, and then derive corresponding efficient equilibrium-finding algorithms. These weakenings (and others) can be seamlessly encoded into EFR at runtime, by simply defining an appropriate set of deviation strategies \Phi. The result is a class of efficient \Phi-equilibrium finding algorithms for EFGs.

This work is the product of Dustin Morrill's Ph.D. thesis.

Other collaborators include Michael Bowling, Marc Lanctot, Ryan D'Orazio, Reca Sarfati, and James R. Wright.

Branislav Bosansky

(Czech Technical University)

Date: November 18, 2021

Title: Solving Dynamic Games with Imperfect Information

Abstract: Finding optimal strategies for dynamic multi-agent interactions, where agents have only partial observations about the environment, is one of today's challenges. Even the cases with two agents and strictly competitive interactions (i.e., zero-sum games) are difficult -- especially if we consider interactions that either require many turns to complete or do not have a limited number of turns. At the same time, we want algorithms with bounded error to know how close to (or far away from) the optimum the found strategies are.

From the game-theoretic perspective, we can model two-agent strictly competitive interactions as zero-sum partially observable stochastic games (zs-POSGs) with the infinite or indefinite horizon. Since even zs-POSGs can be undecidable, we pose further restrictions that allow us to design and implement search algorithms that are guaranteed to converge to optimal strategies and have a bounded error. Our algorithms are inspired by Heuristic Search Value Iteration (HSVI) for partially-observable Markov decision processes (POMDPs), however, significantly modified to solve games where (a) only one player has partial information or (b) where both players have partial information but all observations are public.

In the talk, I will describe the key characteristics and the schema of all of our algorithms and identify future directions to scale up and/or generalize our algorithms.

Marta Garnelo

(DeepMind)

Date: November 4, 2021

Title: Learning dynamics of agent populations

Abstract: Multi-agent systems have a wide range of applications: from image generation (GANs), AI for games (StarCraft, chess…) and simulation of real world problems (self-driving cars).

However, when moving from single-agent to multiagent systems one encounters several challenges such as the lack of an absolute performance measure and the need for opponents to train against. These traits make training agents in multiagent environments a difficult task. In an attempt to overcome such limitations, some researchers are instead turning towards open-ended methods, and considering how to design the underlying learning dynamics.

In this talk we focus on agent populations as a study case to understand the learning dynamics in these systems. We discuss the challenges of evaluating performance (both at a single agent level as well as at population level) and discuss different approaches to training strong populations of agents.

Marc Lanctot

(DeepMind)

Date: July 22, 2021

Title: Game-Theoretic Approaches for Multiagent Reinforcement Learning in Partially Observable Environments, and a Plea to “Go Wide”

Abstract: In many situations, agents have only partial knowledge of the true state of the world, and yet make decisions based on their observations alone. Outcomes can depend critically on hidden information. This is exemplified in the game of Poker, a standard AI benchmark in the field, and important in other large challenge domains such as StarCraft 2, Dota 2, Capture-the-Flag, and Hanabi. The first part of the talk will take a tour through the recent game-theoretic approaches to RL in partially observable games. While motivated mainly by applications to two-player zero-sum games, these approaches have been the basis for novel theoretical developments and have inspired new algorithms for several more general settings, including the large-scale seven-player challenge domain Diplomacy. The second part of the talk will motivate exploring even further beyond specific focus areas, particularly via evaluation methodologies that compare agents across a wide variety of domains.

Roxana Rădulescu

(Vrije Universiteit Brussel)

Date: June 24, 2021

Title: Decision Making in Multi-Objective Multi-Agent Settings

Abstract: The prevalence of artificial agents in our world raises the need to ensure that they are able to handle the salient properties of the environment, in order to plan or learn how to solve specific tasks. In the first part of this talk, we discuss how the process of learning and decision-making of agents can be formalised and approached when there are multiple agents involved, and when there are multiple objectives that need to be considered in the process. To analyse such problems, we adopt a utility-based perspective, and advocate that compromises between competing objectives should be made on the basis of the utility that these compromises have for the users, in other words, it should depend on the desirability of the outcomes. The second part of this talk will discuss how opponent modelling can be approached in multi-agent multi-objective settings and discuss a few results in multi-objective normal-form games.

Matthew E. Taylor

(University of Alberta)

Date: May 27, 2021

Title: Help an agent out: How should agent-agent and human-agent teams learn to work together?

Abstract: As more reinforcement learning agents are deployed, they will need to learn to collaborate with other agents and humans. The first part of this talk will overview a line of research where existing agents or humans can teach new agents to quickly learn in single- and multi-agent environments. The second part of this talk will discuss how human-agent teams can outperform agent-only and human-only teams, but where additional research is critically needed.

Francisco Santos

(Instituto Superior Técnico)

Date: March 18, 2021

Title: Dynamics of cooperation in large-scale multiagent systems

Abstract: Cooperation remains one of the major scientific challenges of our century. It is not only challenging to understand the mechanisms underlying the emergence of cooperation in nature, but also daring to apply this understanding to foster pro-sociality in situations in which cooperation remains astray. Several subtleties of modern human interactions add opportunities and difficulties to understanding cooperation: large-scale reputation systems, information overload, scientific uncertainty, networks of interaction, decentralized governance and hybrid human-agent populations all impact cooperation dynamics and open new routes to steering them. In this seminar, I will resort to game theory and large populations of adaptive agents to discuss some of these challenges. I shall analyze key aspects of human collective action, such as the effects of cognitive complexity, reputations, and the social norms that establish what characterizes a good or bad action. I will also discuss how the same framework may be useful to model cooperation in more complex scenarios, such as environmental governance, where individuals face a social dilemma shaped by uncertain and future returns. I will discuss whether a polycentric structure of multiple small-scale agreements provides a viable solution to solve global dilemmas, and the advantages and disadvantages of different institutional layouts. Finally, I will address the impact of scientific uncertainty on the polarization of preferences, and how autonomous machines may sway individuals' propensity to free-ride on the effort of others.

Jakob Foerster

(University of Toronto)

Date: March 11, 2021

Title: Learning to Cooperate, Communicate and Coordinate (with Humans)

Abstract: In recent years we have seen rapid progress on a number of zero-sum benchmark problems in artificial intelligence, e.g. Go, Poker and Dota. In contrast to these competitive settings, success in the real world typically requires humans, and will require AI agents, to cooperate, communicate and coordinate with others. Crucially, from a learning point of view, these three Cs require fundamentally novel approaches, methods and theory, which has been at the heart of my research agenda.

In my talk I will cover recent progress, including how agents can learn to entice others to cooperate in settings of conflicting goals by accounting for their learning behavior, how they can learn to communicate by reasoning over (public) beliefs and how they can learn policies that can coordinate with other agents at test time by exploiting the symmetries in the environment.

I will finish the talk by outlining some of the promising directions for future work.

Peter Stone

(University of Texas at Austin & Sony AI)

Date: February 25, 2021

Title: Topics in Multiagent Learning Motivated by Ad Hoc Teamwork

Abstract: As autonomous agents proliferate in the real world, both in software and robotic settings, they will increasingly need to band together for cooperative activities with previously unfamiliar teammates. In such "ad hoc" team settings, team strategies cannot be developed a priori. Rather, an agent must be prepared to cooperate with many types of teammates: it must collaborate without pre-coordination. This talk will cover past and ongoing research on multiagent learning, much of which has been motivated by the ad hoc teamwork challenge.

Craig Boutilier

(Google Research)

Date: February 11, 2021

Title: Maximizing User Social Welfare in Recommender Ecosystems

Abstract: An important goal for recommender systems is to make recommendations that maximize some form of user utility over (ideally, extended periods of) time. While reinforcement learning has started to find limited application in recommendation settings, for the most part, practical recommender systems remain "myopic" (i.e., focused on immediate user responses). Moreover, they are "local" in the sense that they rarely consider the impact that a recommendation made to one user may have on the ability to serve other users. These latter "ecosystem effects" play a critical role in optimizing long-term user utility. In this talk, I describe some recent work we have been doing to optimize user utility and social welfare using reinforcement learning and equilibrium modeling of the recommender ecosystem; draw connections between these models and notions such as fairness and incentive design; and outline some future challenges for the community.

Michael Bowling

(DeepMind)

Date: January 28, 2021

Title: Hindsight Rationality: Alternatives to Nash

Abstract: I will look at some of the often unstated principles common in multiagent learning research, suggesting that they may be responsible for holding us back. In response, I will offer an alternative set of principles, which leads to the view of hindsight rationality, with connections to online learning and correlated equilibria. I will then describe some recent technical work understanding better the relationships between different notions of hindsight rationality, and also how we can build increasingly more powerful algorithms for sequential decision-making settings.