8:30 - 8:40
8:40 - 9:10
Serdar Yüksel (Queen's University)
Abstract: We study decentralized stochastic control problems with general spaces, under three information structures: The one-step delayed information sharing pattern (OSDISP), the K-step periodic information sharing pattern (KSPISP), and the completely decentralized information structure (CDIS) where there is no sharing of information among the decision makers. The case with single agent, involving POMDPs will also be reviewed as a special case. For such models, the existence, rigorous approximations, and learning of optimal policies has generally been an open problem. To this end, we first show that models with OSDISP and KSPISP patterns can be reduced/lifted to a centralized MDP, generalizing prior results which considered finite, linear, or static models. A critical measurability result is our primary contribution in this context via an analysis based on Young topology on policies. We then provide sufficient conditions for the lifted transition kernels to be weak-Feller, which leads to existence of an optimal solution and facilitates rigorous approximation and learning theoretic results. We will then show that under CDIS, optimal policies exist by a similar approach. Notably, we show the near optimality of finite memory policies for OSDISP and KSPISP under a predictor stability condition, generalizing recent related results on POMDPs. Once existence and weak Feller regularity is established, we will review the quantized Q-learning algorithm which converges to near optimality. This will provide a general recipe of learning results via first lifting to a Borel space valued MDP, and its quantized Q-learning. As a corollary, we show that for POMDPs, and for decentralized stochastic control under the KSPISP and OSDISP, the learning algorithm converges asymptotically to a near optimal solution. Thus, our contributions are to provide existence, rigorous approximations, and learning results for decentralized stochastic control problems under the CDIS, KSPISP, and OSDISP. Connections with related results on POMDPs will also be presented. Joint work with Ali Kara, Yunus Emre Demirci, Naci Saldi, and Omar Mrani-Zentar.
9:10 - 9:40
Chi Jin (Princeton)
Abstract: While classical game theory primarily focuses on finding equilibria, modern machine learning applications introduce a series of new challenges where standard equilibrium notions are no longer sufficient, and the development of new efficient algorithmic solutions is urgently needed. In this talk, we will demonstrate two such scenarios: (1) a natural goal in multiagent learning is to learn rationalizable behavior, which avoids iteratively dominated actions. Unfortunately, such rationalizability is not guaranteed by standard equilibria, especially when approximation errors are present. Our work presents the first line of efficient algorithms for learning rationalizable equilibria with sample complexities that are polynomial in all problem parameters, including the number of players; (2) In multiplayer symmetric constant-sum games like Mahjong or Poker, a natural baseline is to achieve an equal share of the total reward. We demonstrate that the self-play meta-algorithms used by existing state-of-the-art systems can fail to achieve this simple baseline in general symmetric games. We will then discuss the new principled solution concept required to achieve this goal.
9:40 - 10:10
Charalambos D. Charalambous (University of Cyprus)
Abstract: In this talk, we review several approaches of classical stochastic games of control which are developed under the assumption of centralized information structures, to develop a path that accounts for decentralized information structures. Along this path we discuss, (i) strategies that do not have perfect recall of their information structures, based on decentralized stochastic Pontryagin's maximum principle, decentralized conditional Hamiltonian systems and backward stochastic differential equations, and (ii) strategies that have perfect recall of their information structures, based on the dynamic programming principle of Markovian and non-Markovian stochastic control systems. Discussed in this talk, are also the connections to (i) and (ii) of two fundamental concepts: Girsanov's change of probability measure and weak and strong solutions of stochastic differential equations and discrete-time stochastic difference equations.
Break 10:10 - 10:30
10:30 - 11:00
Na (Lina) Li (Harvard)
Abstract: Network Markov Decision Processes (MDPs), a popular model for multi-agent control, pose a significant challenge to efficient learning due to the exponential growth of the global state-action space with the number of agents. In this work, utilizing the exponential decay property of network dynamics, we first derive scalable spectral local representations for network MDPs, which induces a network linear subspace for the local Q-function of each agent. Building on these local spectral representations, we design a scalable algorithmic framework for continuous state-action network MDPs, and provide end-to-end guarantees for the convergence of our algorithm. Empirically, we validate the effectiveness of our scalable representation-based approach on two benchmark problems, and demonstrate the advantages of our approach over generic function approximation approaches to representing the local Q-functions.
11:00 - 11:30
Aditya Mahajan (McGill)
Abstract: In this talk, we present a generalization of the certainty equivalence principle of stochastic control. One interpretation of the classical certainty equivalence principle for linear systems with output feedback and quadratic costs is as follows: the optimal action at each time is obtained by evaluating the optimal state-feedback policy of the stochastic linear system at the minimum mean square error (MMSE) estimate of the state. Motivated by this interpretation, we consider certainty equivalent policies for general (non-linear) partially observed stochastic systems and allow for any state estimate rather than restricting to MMSE estimates. In such settings, the certainty equivalent policy is not optimal. For models with Lipschitz cost and dynamics, we derive upper bounds on the sub-optimality of certainty equivalent policies in terms of the expected error of the proposed estimator. We present several examples to illustrate the results. Joint work with Berk Bozkurt, Ashutosh Nayyar, and Yi Ouyang.
11:30 - 12:20
Matthew S. Hankins
Nguyen Anh Duc
Nicolas Chatzikiriakosi
Rahul Misra
Mengyuan Zhao
Lunch Break 12:20 - 14:00
14:00 - 14:30
Tamer Başar (UIUC)
Abstract: Decision making in dynamic uncertain environments with multiple agents arises in many disciplines and application domains, including control, communications, distributed optimization, social networks, and economics. Here a natural framework, and a comprehensive one, for modeling, optimization, and analysis is the one provided by stochastic dynamic games (SDGs), which accommodates different solution concepts depending on how the interactions among the agents are modeled, particularly whether they are in a cooperative mode (with the same objective functions, as in teams) or in a noncooperative mode (with different objective functions) or a mix of the two, such as teams of agents interacting noncooperatively across different teams (and of course cooperatively within each team). What also affects (strategic) interactions among the agents is the asymmetric nature of the information different agents acquire (and do not share or only partially share (selectively) with others, even within teams). What makes such problems even more challenging in a dynamic environment with networked agents is the dependence of the information available to one agent at some point in time on the policies or decisions of other agents who have already acted at earlier instants of time. Such decision problems, initially studied in a team framework, are known as those with nonclassical information where optimal policies of team agents must be designed to balance a tradeoff between contribution to optimality of the team objective function and signaling through their actions useful information to other agents in their neighborhood who would be acting after them. Existence of such a tradeoff between signaling and optimization creates even more challenging issues in SDGs with misaligned objectives among at least a subset of agents, which, however, can be addressed effectively for a specially structured subclass of such games, namely mean-field games.
This talk will provide an overview of the landscape above, first for a general class of stochastic dynamic teams and games, and then for a subclass where the objective functions are quadratic, and the interaction relationships are linear. The talk will also cover, as time per- mits, reinforcement learning embedded into policy development, when agents do not have precise information on the underlying models.
14:30 - 15:00
Vijay Subramanian (UMich)
Abstract: Multi-agent systems appear in many engineering and socioeconomic settings, wherein a group of agents or controllers interact with each other in a shared and possibly non-stationary environment, and make sequential decisions based on their own information using a (causal) interaction mechanism. In this talk, we focus attention on cooperative sequential decision making under uncertainty—a decentralized team, where a fixed finite number of agents act as a team with the common goal of minimizing a long-term cost function. We investigate the general situation where one long-term (objective) cost must be minimized, while maintaining multiple other long-term (constraint) costs within prescribed limits via a cooperative Multi-Agent Constrained Partially Observable Markov Decision Process (MAC-POMDP) model. Such constrained sequential team decision problems arise in several real-world applications where efficient operation must be balanced with maintaining safe operating margins—such considerations arise in communication networks, traffic management, energy-grid optimization, e-commerce pricing, environmental monitoring, etc. We focus on the discounted cost criterion, and start by establishing general results on Lagrangian duality and the existence of a global saddle point. Next, we consider decentralized policy profiles and their mixtures, and establish that when agents mix jointly over their policy profiles, there is no (Lagrangian) duality gap, and a global saddle-point exists under the Slater’s condition. However, when agents mix independently over their policy profiles, we show (through a concrete counterexample) that a non-zero duality gap can exist. Then, we consider coordination policies and their mixtures, and establish that, except for pure coordination policies, they are all equivalent to joint mixtures of decentralized policy profiles. This equivalence result helps reformulate the original multi-agent constrained optimization problem into a single-agent constrained optimization problem, which is then used to propose a primal-dual framework for model-based optimal control. Finally, we extend the notion of a Multi-Agent Approximate Information State (MA-AIS) to constrained decision making, and formalize MA-AIS-based coordination policies and their mixtures. We establish through a concrete counterexample that, in contrast to behavioral coordination policies, MA-AIS based behavioral coordination policies and their mixtures are not equivalent. We also establish approximate optimality of mixtures of MA-AIS-based coordination policies, and use this result to guide the development of a data-driven alternative for the aforementioned model-based primal-dual framework. (This is joint work with Nouman Khan, Amazon Search, Seattle, WA, which was carried out when he was associated with the University of Michigan, Ann Arbor.)
15:00 - 15:30
Alex Olshevsky (Boston University)
Abstract: We introduce gradient splitting, a unified gradient-based framework that allows us to obtain new insights into temporal-difference learning and actor-critic methods. By analyzing these classic algorithms through the lens of gradient splitting, we derive improved convergence guarantees in centralized settings as well as a deeper understanding of why these methods work. Surprisingly, this same framework shows that in a variety of decentralized policy-evaluation tasks, simple one-shot averaging—where each agent learns independently and only aggregates its final parameters—matches or surpasses more complex protocols that rely on continuous inter-agent communication.
Break 15:30 - 15:50
15:50 - 16:20
Prashant Mehta (UIUC)
Abstract: Transformer is the name of the core algorithm inside a large language model (LLM). In the so-called decoder-only transformer, a finite sequence of symbols (tokens) is mapped to the conditional probability of the next token. In this talk, I situate the transformer within the broader history of the prediction theory: In the early 1940s, Wiener introduced a linear predictor, where the conditional expectation of future data is computed by linearly combining the past data. I argue that a decoder-only transformer generalizes this idea and that a transformer is best understood as a causal nonlinear predictor. The technical results for causal nonlinear prediction are described for the special case where the data is discrete-valued and generated from an underlying hidden Markov model (HMM).
16:20 - 16:50
Ali Jadbabaie (MIT)
Abstract: In this talk, I will review some of my group’s work on information aggregation, social learning and decentralized estimation in networks, reviewing the literature from statistics, economics, signal processing, and control. The goal of this line of research is to understand the statistical and computational complexities of information aggregation and learning using heterogeneous, disparate sources.
16:50 - 17:20
Kaiqing Zhang (UMD)
Abstract: In this talk, we will study provable partially observable reinforcement learning (RL) with asymmetric information, a scenario inspired by real-world deep RL applications in robot learning and autonomous driving. The asymmetric information may arise both across different agents, and across the training and testing phases. In this talk, we will share some of our recent explorations in understanding the pitfalls and provable benefits of partially observable RL with such asymmetric information, in terms of both sample and computational complexities, through the lens of information structures from (decentralized) stochastic control.
17:20 - 18:00
Semih Kara
Kunpeng Zhang
Qingkai Meng
Kushagra Gupta
Raj Kiriti Velicheti
Vijeth Hebbar