The Schedule
(Tentative)
(Tentative)
8:30 - 8:40
8:40 - 9:10
Ali Jadbabaie (MIT)
Abstract: In this talk, I will review some of my group’s work on information aggregation, social learning and decentralized estimation in networks, reviewing the literature from statistics, economics, signal processing, and control. The goal of this line of research is to understand the statistical and computational complexities of information aggregation and learning using heterogeneous, disparate sources.
9:10 - 9:40
Chi Jin (Princeton)
Abstract: While classical game theory primarily focuses on finding equilibria, modern machine learning applications introduce a series of new challenges where standard equilibrium notions are no longer sufficient, and the development of new efficient algorithmic solutions is urgently needed. In this talk, we will demonstrate two such scenarios: (1) a natural goal in multiagent learning is to learn rationalizable behavior, which avoids iteratively dominated actions. Unfortunately, such rationalizability is not guaranteed by standard equilibria, especially when approximation errors are present. Our work presents the first line of efficient algorithms for learning rationalizable equilibria with sample complexities that are polynomial in all problem parameters, including the number of players; (2) In multiplayer symmetric constant-sum games like Mahjong or Poker, a natural baseline is to achieve an equal share of the total reward. We demonstrate that the self-play meta-algorithms used by existing state-of-the-art systems can fail to achieve this simple baseline in general symmetric games. We will then discuss the new principled solution concept required to achieve this goal.
9:40 - 10:10
Cedric Langbort (UIUC)
Abstract: Control strategies that rely on shaping the information available to a decision-maker have become popular recently in robotics, engineering, computer science and information economics. The canonical model for such strategic information transmission is known as Bayesian Persuasion and was introduced in the latter field in 2016 by Kamenica and Gentzkow. As the name suggests, this setting assumes that the decision-maker whose information set is influenced acts fully rationally, and according to Bayes rule, upon receiving messages from the sender/controller/influencer. This assumption not only delineates the kind of situations captured by the model but is also, as it turns out, central to its tractability. Indeed, Bayes’ rule and the fact that the decision-maker acts optimally make two sets of tools – the so-called ”concavification procedure” and the ”revelation principle”– available to reformulate the sender’s problem in an attractive form. In this talk, we present some recent advances on this kind of Strategic Information Transmission problems that account for a number of non-idealities on the part of the receiver, including a learning receiver. Most of this work is joint with Olivier Massicot.
Break 10:10 - 10:30
10:30 - 11:00
Na (Lina) Li (Harvard)
Abstract: Network Markov Decision Processes (MDPs), a popular model for multi-agent control, pose a significant challenge to efficient learning due to the exponential growth of the global state-action space with the number of agents. In this work, utilizing the exponential decay property of network dynamics, we first derive scalable spectral local representations for network MDPs, which induces a network linear subspace for the local Q-function of each agent. Building on these local spectral representations, we design a scalable algorithmic framework for continuous state-action network MDPs, and provide end-to-end guarantees for the convergence of our algorithm. Empirically, we validate the effectiveness of our scalable representation-based approach on two benchmark problems, and demonstrate the advantages of our approach over generic function approximation approaches to representing the local Q-functions.
11:00 - 11:30
Aditya Mahajan (McGill)
Abstract: We consider a dynamic team problem where all agents are using agent-state based policies. The optimal policy for such problems can be obtaining via the designer’s approach, but the resulting dynamic program requires solving a functional optimization problem at each stage and is therefore computationally intractable for all but small scale models. In this talk, we present a policy search algorithm that solves a parameter optimization problem at each step and guarantees monotonic policy improvement (which implies that it converges to a locally optimal solution). We present several examples to illustrate the algorithm. Joint work with Amit Sinha and Matthieu Geist.
11:30 - 12:00
Rahul Jain (USC)
Abstract: In this talk, we present the exploration of an episodic zero-sum game setting where the learning agent plays against an arbitrary opponent. The game is unknown, as is the strategy of the other player. The opponent can also be a learning agent, or use any causal strategy of its choice. We propose a posterior sampling algorithm for the ego-player, and show that it can achieve sublinear regret despite both the game and the strategy used by the other player being unknown.
Lunch & Poster Session 12:00 - 13:50
13:50 - 14:20
Tamer Başar (UIUC)
Abstract: Decision making in dynamic uncertain environments with multiple agents arises in many disciplines and application domains, including control, communications, distributed optimization, social networks, and economics. Here a natural framework, and a comprehensive one, for modeling, optimization, and analysis is the one provided by stochastic dynamic games (SDGs), which accommodates different solution concepts depending on how the interactions among the agents are modeled, particularly whether they are in a cooperative mode (with the same objective functions, as in teams) or in a noncooperative mode (with different objective functions) or a mix of the two, such as teams of agents interacting noncooperatively across different teams (and of course cooperatively within each team). What also affects (strategic) interactions among the agents is the asymmetric nature of the information different agents acquire (and do not share or only partially share (selectively) with others, even within teams). What makes such problems even more challenging in a dynamic environment with networked agents is the dependence of the information available to one agent at some point in time on the policies or decisions of other agents who have already acted at earlier instants of time. Such decision problems, initially studied in a team framework, are known as those with nonclassical information where optimal policies of team agents must be designed to balance a tradeoff between contribution to optimality of the team objective function and signaling through their actions useful information to other agents in their neighborhood who would be acting after them. Existence of such a tradeoff between signaling and optimization creates even more challenging issues in SDGs with misaligned objectives among at least a subset of agents, which, however, can be addressed effectively for a specially structured subclass of such games, namely mean-field games.
This talk will provide an overview of the landscape above, first for a general class of stochastic dynamic teams and games, and then for a subclass where the objective functions are quadratic, and the interaction relationships are linear. The talk will also cover, as time per- mits, reinforcement learning embedded into policy development, when agents do not have precise information on the underlying models.
14:20 - 14:50
Vijay Subramanian (UMich)
Abstract: Multi-agent systems appear in many engineering and socioeconomic settings, wherein a group of agents or controllers interact with each other in a shared and possibly non-stationary environment, and make sequential decisions based on their own information using a (causal) interaction mechanism. In this talk, we focus attention on cooperative sequential decision making under uncertainty—a decentralized team, where a fixed finite number of agents act as a team with the common goal of minimizing a long-term cost function. We investigate the general situation where one long-term (objective) cost must be minimized, while maintaining multiple other long-term (constraint) costs within prescribed limits via a cooperative Multi-Agent Constrained Partially Observable Markov Decision Process (MAC-POMDP) model. Such constrained sequential team decision problems arise in several real-world applications where efficient operation must be balanced with maintaining safe operating margins—such considerations arise in communication networks, traffic management, energy-grid optimization, e-commerce pricing, environmental monitoring, etc. We focus on the discounted cost criterion, and start by establishing general results on Lagrangian duality and the existence of a global saddle point. Next, we consider decentralized policy profiles and their mixtures, and establish that when agents mix jointly over their policy profiles, there is no (Lagrangian) duality gap, and a global saddle-point exists under the Slater’s condition. However, when agents mix independently over their policy profiles, we show (through a concrete counterexample) that a non-zero duality gap can exist. Then, we consider coordination policies and their mixtures, and establish that, except for pure coordination policies, they are all equivalent to joint mixtures of decentralized policy profiles. This equivalence result helps reformulate the original multi-agent constrained optimization problem into a single-agent constrained optimization problem, which is then used to propose a primal-dual framework for model-based optimal control. Finally, we extend the notion of a Multi-Agent Approximate Information State (MA-AIS) to constrained decision making, and formalize MA-AIS-based coordination policies and their mixtures. We establish through a concrete counterexample that, in contrast to behavioral coordination policies, MA-AIS based behavioral coordination policies and their mixtures are not equivalent. We also establish approximate optimality of mixtures of MA-AIS-based coordination policies, and use this result to guide the development of a data-driven alternative for the aforementioned model-based primal-dual framework. (This is joint work with Nouman Khan, Amazon Search, Seattle, WA, which was carried out when he was associated with the University of Michigan, Ann Arbor.)
14:50 - 15:20
Alex Olshevsky (Boston University)
Abstract: We introduce gradient splitting, a unified gradient-based framework that allows us to obtain new insights into temporal-difference learning and actor-critic methods. By analyzing these classic algorithms through the lens of gradient splitting, we derive improved convergence guarantees in centralized settings as well as a deeper understanding of why these methods work. Surprisingly, this same framework shows that in a variety of decentralized policy-evaluation tasks, simple one-shot averaging—where each agent learns independently and only aggregates its final parameters—matches or surpasses more complex protocols that rely on continuous inter-agent communication.
Break 15:20 - 15:40
15:40 - 16:10
Serdar Yüksel (Queen's University)
Abstract: We study decentralized stochastic control problems with general spaces, under three information structures: The one-step delayed information sharing pattern (OSDISP), the K-step periodic information sharing pattern (KSPISP), and the completely decentralized information structure (CDIS) where there is no sharing of information among the decision makers. For such general standard Borel models, the existence, rigorous approximations, and learning of optimal policies for each of these models has generally been an open problem. To this end, we first show that models with OSDISP and KSPISP patterns can be reduced to a centralized MDP, generalizing prior results which considered finite, linear, or static models. A critical measurability result is our primary contribution in this context via an analysis based on Young topology on policies. We then provide sufficient conditions for the transition kernels to be weak-Feller, which leads to existence of an optimal solution and facilitates rigorous approximation and learning theoretic results. We will then show that under CDIS, finite memory local policies are near optimal under a joint conditional mixing condition which may also be taken to be stationary under further conditions (thus generalizing related recent results on POMDPs). We finally show that under the KSPISP and OSDISP, a quantized Q-learning algorithm converges asymptotically towards a near optimal solution. Thus, our contributions are to provide existence results and rigorous approximation results for decentralized stochastic control problems under the CDIS, KSPISP, and OSDISP. Furthermore, the Q-learning algorithm we propose comes with explicit performance guarantees. Joint work with Omar Mrani-Zentar.
16:10 - 16:40
Prashant Mehta (UIUC)
Abstract: TBD
16:40 - 17:10
Kaiqing Zhang (UMD)
Abstract: In this talk, we will study provable multi-agent reinforcement learning (RL) in the general framework of partially observable stochastic games (POSGs). To circumvent the known hardness results and the use of computationally intractable oracles, we advocate leveraging the potential information-sharing among agents, a common practice in empirical multi-agent RL, and a standard model for multiagent control systems with communications. We first establish several computational complexity results to justify the necessity of information-sharing, as well as the observability assumption that has enabled quasi-efficient single-agent RL with partial observations, for efficiently solving POSGs. Inspired by the inefficiency of planning in the ground-truth model, we then propose to further approximate the shared common information to construct an approximate model of the POSG, in which planning an approximate equilibrium (in terms of solving the original POSG) can be quasi-efficient, i.e., of quasi-polynomial-time, under the aforementioned assumptions. Furthermore, we develop a partially observable multi-agent RL algorithm that is both statistically and computationally quasi-efficient. Finally, beyond equilibrium learning, we extend our algorithmic framework to finding the team-optimal solution in cooperative POSGs, i.e., decentralized partially observable Markov decision processes, a much more challenging goal. We establish concrete computational and sample complexities under several common structural assumptions of the model. We hope our study could inspire more studies to examine through and leverage information structures, a well-studied notion in control theory, for developing both sample- and computation-efficient partially observable multi-agent RL.
17:10 - 18:00