The Schedule

(Tentative)

December 9, 2025

8:30 - 8:40

Opening Remarks

8:40 - 9:10

Title: Decentralized Stochastic Control with Borel Spaces: Existence, Near Optimality of Local Finite Memory Policies, and Q-learning

Serdar Yüksel (Queen's University)

Abstract: We study decentralized stochastic control problems with general spaces, under three information structures: The one-step delayed information sharing pattern (OSDISP), the K-step periodic information sharing pattern (KSPISP), and the completely decentralized information structure (CDIS) where there is no sharing of information among the decision makers. For such general standard Borel models, the existence, rigorous approximations, and learning of optimal policies for each of these models has generally been an open problem. To this end, we first show that models with OSDISP and KSPISP patterns can be reduced to a centralized MDP, generalizing prior results which considered finite, linear, or static models. A critical measurability result is our primary contribution in this context via an analysis based on Young topology on policies. We then provide sufficient conditions for the transition kernels to be weak-Feller, which leads to existence of an optimal solution and facilitates rigorous approximation and learning theoretic results. We will then show that under CDIS, finite memory local policies are near optimal under a joint conditional mixing condition which may also be taken to be stationary under further conditions (thus generalizing related recent results on POMDPs). We finally show that under the KSPISP and OSDISP, a quantized Q-learning algorithm converges asymptotically towards a near optimal solution. Thus, our contributions are to provide existence results and rigorous approximation results for decentralized stochastic control problems under the CDIS, KSPISP, and OSDISP. Furthermore, the Q-learning algorithm we propose comes with explicit performance guarantees. Joint work with Omar Mrani-Zentar.

9:10 - 9:40

Title: Beyond Equilibrium Learning

Chi Jin (Princeton)

Abstract: While classical game theory primarily focuses on finding equilibria, modern machine learning applications introduce a series of new challenges where standard equilibrium notions are no longer sufficient, and the development of new efficient algorithmic solutions is urgently needed. In this talk, we will demonstrate two such scenarios: (1) a natural goal in multiagent learning is to learn rationalizable behavior, which avoids iteratively dominated actions. Unfortunately, such rationalizability is not guaranteed by standard equilibria, especially when approximation errors are present. Our work presents the first line of efficient algorithms for learning rationalizable equilibria with sample complexities that are polynomial in all problem parameters, including the number of players; (2) In multiplayer symmetric constant-sum games like Mahjong or Poker, a natural baseline is to achieve an equal share of the total reward. We demonstrate that the self-play meta-algorithms used by existing state-of-the-art systems can fail to achieve this simple baseline in general symmetric games. We will then discuss the new principled solution concept required to achieve this goal.

9:40 - 10:10

Title: A Path to Decentralized Stochastic Games of Control from Centralized Games of Control

Charalambos D. Charalambous (University of Cyprus)

Abstract: In this talk, we review several approaches of classical stochastic games of control which are developed under the assumption of centralized information structures, to develop a path that accounts for decentralized information structures. Along this path we discuss, (i) strategies that do not have perfect recall of their information structures, based on decentralized stochastic Pontryagin's maximum principle, decentralized conditional Hamiltonian systems and backward stochastic differential equations, and (ii) strategies that have perfect recall of their information structures, based on the dynamic programming principle of Markovian and non-Markovian stochastic control systems. Discussed in this talk, are also the connections to (i) and (ii) of two fundamental concepts: Girsanov's change of probability measure and weak and strong solutions of stochastic differential equations and discrete-time stochastic difference equations.

Break 10:10 - 10:30

10:30 - 11:00

Title: Scalable Spectral Representations for Multi-agent Reinforcement Learning in Network MDPs

Na (Lina) Li (Harvard)

Abstract: Network Markov Decision Processes (MDPs), a popular model for multi-agent control, pose a significant challenge to efficient learning due to the exponential growth of the global state-action space with the number of agents. In this work, utilizing the exponential decay property of network dynamics, we first derive scalable spectral local representations for network MDPs, which induces a network linear subspace for the local Q-function of each agent. Building on these local spectral representations, we design a scalable algorithmic framework for continuous state-action network MDPs, and provide end-to-end guarantees for the convergence of our algorithm. Empirically, we validate the effectiveness of our scalable representation-based approach on two benchmark problems, and demonstrate the advantages of our approach over generic function approximation approaches to representing the local Q-functions.

11:00 - 11:30

Title: Sub-optimality bounds for certainty equivalent policies in partially observed systems

Aditya Mahajan (McGill)

Abstract: In this talk, we present a generalization of the certainty equivalence principle of stochastic control. One interpretation of the classical certainty equivalence principle for linear systems with output feedback and quadratic costs is as follows: the optimal action at each time is obtained by evaluating the optimal state-feedback policy of the stochastic linear system at the minimum mean square error (MMSE) estimate of the state. Motivated by this interpretation, we consider certainty equivalent policies for general (non-linear) partially observed stochastic systems and allow for any state estimate rather than restricting to MMSE estimates. In such settings, the certainty equivalent policy is not optimal. For models with Lipschitz cost and dynamics, we derive upper bounds on the sub-optimality of certainty equivalent policies in terms of the expected error of the proposed estimator. We present several examples to illustrate the results. Joint work with Berk Bozkurt, Ashutosh Nayyar, and Yi Ouyang.

11:30 - 12:20

Junior Researcher Session

Matthew S. Hankins

Nguyen Anh Duc

Nicolas Chatzikiriakosi

Rahul Misra

Mengyuan Zhao

Lunch & Poster Session 12:20 - 14:00

14:00 - 14:30

Title: Multi-Agent Dynamical Systems with Asymmetric Information and with Elements of Learning

Tamer Başar (UIUC)

Abstract: Decision making in dynamic uncertain environments with multiple agents arises in many disciplines and application domains, including control, communications, distributed optimization, social networks, and economics. Here a natural framework, and a comprehensive one, for modeling, optimization, and analysis is the one provided by stochastic dynamic games (SDGs), which accommodates different solution concepts depending on how the interactions among the agents are modeled, particularly whether they are in a cooperative mode (with the same objective functions, as in teams) or in a noncooperative mode (with different objective functions) or a mix of the two, such as teams of agents interacting noncooperatively across different teams (and of course cooperatively within each team). What also affects (strategic) interactions among the agents is the asymmetric nature of the information different agents acquire (and do not share or only partially share (selectively) with others, even within teams). What makes such problems even more challenging in a dynamic environment with networked agents is the dependence of the information available to one agent at some point in time on the policies or decisions of other agents who have already acted at earlier instants of time. Such decision problems, initially studied in a team framework, are known as those with nonclassical information where optimal policies of team agents must be designed to balance a tradeoff between contribution to optimality of the team objective function and signaling through their actions useful information to other agents in their neighborhood who would be acting after them. Existence of such a tradeoff between signaling and optimization creates even more challenging issues in SDGs with misaligned objectives among at least a subset of agents, which, however, can be addressed effectively for a specially structured subclass of such games, namely mean-field games.

This talk will provide an overview of the landscape above, first for a general class of stochastic dynamic teams and games, and then for a subclass where the objective functions are quadratic, and the interaction relationships are linear. The talk will also cover, as time per- mits, reinforcement learning embedded into policy development, when agents do not have precise information on the underlying models.

14:30 - 15:00

Title: Cooperative Multi-Agent Constrained POMDPs: Strong Duality and Primal-Dual Reinforcement Learning with Approximate Information States

Vijay Subramanian (UMich)

Abstract: Multi-agent systems appear in many engineering and socioeconomic settings, wherein a group of agents or controllers interact with each other in a shared and possibly non-stationary environment, and make sequential decisions based on their own information using a (causal) interaction mechanism. In this talk, we focus attention on cooperative sequential decision making under uncertainty—a decentralized team, where a fixed finite number of agents act as a team with the common goal of minimizing a long-term cost function. We investigate the general situation where one long-term (objective) cost must be minimized, while maintaining multiple other long-term (constraint) costs within prescribed limits via a cooperative Multi-Agent Constrained Partially Observable Markov Decision Process (MAC-POMDP) model. Such constrained sequential team decision problems arise in several real-world applications where efficient operation must be balanced with maintaining safe operating margins—such considerations arise in communication networks, traffic management, energy-grid optimization, e-commerce pricing, environmental monitoring, etc. We focus on the discounted cost criterion, and start by establishing general results on Lagrangian duality and the existence of a global saddle point. Next, we consider decentralized policy profiles and their mixtures, and establish that when agents mix jointly over their policy profiles, there is no (Lagrangian) duality gap, and a global saddle-point exists under the Slater’s condition. However, when agents mix independently over their policy profiles, we show (through a concrete counterexample) that a non-zero duality gap can exist. Then, we consider coordination policies and their mixtures, and establish that, except for pure coordination policies, they are all equivalent to joint mixtures of decentralized policy profiles. This equivalence result helps reformulate the original multi-agent constrained optimization problem into a single-agent constrained optimization problem, which is then used to propose a primal-dual framework for model-based optimal control. Finally, we extend the notion of a Multi-Agent Approximate Information State (MA-AIS) to constrained decision making, and formalize MA-AIS-based coordination policies and their mixtures. We establish through a concrete counterexample that, in contrast to behavioral coordination policies, MA-AIS based behavioral coordination policies and their mixtures are not equivalent. We also establish approximate optimality of mixtures of MA-AIS-based coordination policies, and use this result to guide the development of a data-driven alternative for the aforementioned model-based primal-dual framework. (This is joint work with Nouman Khan, Amazon Search, Seattle, WA, which was carried out when he was associated with the University of Michigan, Ann Arbor.)

15:00 - 15:30

Title: Distributed Reinforcement Learning and Gradient Descent

Alex Olshevsky (Boston University)

Abstract: We introduce gradient splitting, a unified gradient-based framework that allows us to obtain new insights into temporal-difference learning and actor-critic methods. By analyzing these classic algorithms through the lens of gradient splitting, we derive improved convergence guarantees in centralized settings as well as a deeper understanding of why these methods work. Surprisingly, this same framework shows that in a variety of decentralized policy-evaluation tasks, simple one-shot averaging—where each agent learns independently and only aggregates its final parameters—matches or surpasses more complex protocols that rely on continuous inter-agent communication.

Break 15:30 - 15:50

15:50 - 16:20

Title: What Can We Learn from Signals and Systems in a Transformer? Insights for Probabilistic Modeling and Inference Architecture

Prashant Mehta (UIUC)

Abstract: Transformer is the name of the core algorithm inside a large language model (LLM). In the so-called decoder-only transformer, a finite sequence of symbols (tokens) is mapped to the conditional probability of the next token. In this talk, I situate the transformer within the broader history of the prediction theory: In the early 1940s, Wiener introduced a linear predictor, where the conditional expectation of future data is computed by linearly combining the past data. I argue that a decoder-only transformer generalizes this idea and that a transformer is best understood as a causal nonlinear predictor. The technical results for causal nonlinear prediction are described for the special case where the data is discrete-valued and generated from an underlying hidden Markov model (HMM).

16:20 - 16:50

Title: Information Aggregation and Social Learning in Networks

Ali Jadbabaie (MIT)

Abstract: In this talk, I will review some of my group’s work on information aggregation, social learning and decentralized estimation in networks, reviewing the literature from statistics, economics, signal processing, and control. The goal of this line of research is to understand the statistical and computational complexities of information aggregation and learning using heterogeneous, disparate sources.

16:50 - 17:20

Title: Partially Observable Multi-Agent Reinforcement Learning with Information Sharing

Kaiqing Zhang (UMD)

Abstract: In this talk, we will study provable multi-agent reinforcement learning (RL) in the general framework of partially observable stochastic games (POSGs). To circumvent the known hardness results and the use of computationally intractable oracles, we advocate leveraging the potential information-sharing among agents, a common practice in empirical multi-agent RL, and a standard model for multiagent control systems with communications. We first establish several computational complexity results to justify the necessity of information-sharing, as well as the observability assumption that has enabled quasi-efficient single-agent RL with partial observations, for efficiently solving POSGs. Inspired by the inefficiency of planning in the ground-truth model, we then propose to further approximate the shared common information to construct an approximate model of the POSG, in which planning an approximate equilibrium (in terms of solving the original POSG) can be quasi-efficient, i.e., of quasi-polynomial-time, under the aforementioned assumptions. Furthermore, we develop a partially observable multi-agent RL algorithm that is both statistically and computationally quasi-efficient. Finally, beyond equilibrium learning, we extend our algorithmic framework to finding the team-optimal solution in cooperative POSGs, i.e., decentralized partially observable Markov decision processes, a much more challenging goal. We establish concrete computational and sample complexities under several common structural assumptions of the model. We hope our study could inspire more studies to examine through and leverage information structures, a well-studied notion in control theory, for developing both sample- and computation-efficient partially observable multi-agent RL.

17:20 - 18:00

Junior Researcher Session

Semih Kara

Kunpeng Zhang

Qingkai Meng

Kushagra Gupta

Raj Kiriti Velicheti

Vijeth Hebbar

Page updated

Google Sites

Report abuse