“You’re the U. S. Robot’s psychologist, aren’t you?”
“Robopsychologist, please.”
“Oh, are robots so different from men, mentally?”
“Worlds different.” She allowed herself a frosty smile, “Robots are essentially decent.”

- Evidence, Isaac Asimov


How can we design efficient local algorithms for multiagent learning with global performance guarantees? 


Automation relegates many decision-making processes from humans to machines. In recent years, automation has been driven by artificial intelligence (AI), which enables trained machines to achieve expert-level performance in some tasks. Many of these tasks require humans to interact, so automating them will require AIs to interact, creating a networked environment where the decisions of one AI affect others and the data they learn from. Examples are autonomous vehicles, multi-robot systems, on-device learning, cloud computing, communication networks, the smart grid, and, more recently, AI-agents based on large language models (LLMs). As game theory predicts, this interaction can lead to globally inefficient outcomes. However, machines follow programmatic objectives and protocols that, unlike humans, are not limited by selfish interests, but by their local information and access to resources. This modern setting calls for new tools and paradigms, which are the focus of my research.


My research tackles the above question by bringing together game theory, distributed control, and multiagent learning. I study networks such as wireless networks, autonomous vehicles, the smart grid, epidemics and social networks, and edge computing. In these networks, designing scalable and efficient algorithms requires overcoming the following challenges:



In such restricted environments,  also known as multiplayer bandits, agents can still learn from the information that is indirectly carried by the actions of others, which affect their environment and the reward value they observe. We have shown that learning from reward values alone, with limited communication of these values between neighboring networked agents, is surprisingly enough to maximize the sum of rewards of all agents under a general interaction structure.


To achieve stronger performance guarantees, we can leverage the structure of the specific interaction of interest. In resource allocation, the challenge is that the agents do not know how others value the same resource. We have designed distributed resource allocation algorithms with near-optimal performance and little to no communication overhead. In congestion games with cooperative players, players can observe their own trip time but not the load on the roads or the trip times of others. We have designed a distributed cooperative routing algorithm with no communication that learns the optimal routing. 


Practical environments are often dynamic and include “states” that evolve as a function of the agents’ actions or external factors. Currently, the main tool to deal with dynamic multiagent systems

is reinforcement learning (RL). Multiagent RL (MARL) algorithms often assume that while training, all agents can communicate perfectly and share a common reward. Such an assumption is only suitable for a small group of agents. For large-scale networks where centralized training is infeasible, most current MARL algorithms are either heuristic or converge to a Nash equilibrium, which can suffer from poor global performance. Instead, agents must learn to coordinate online after deployment. Thus, the main challenge becomes the local agent information. To overcome these issues, we tackle dynamic multiuagent learning bottom-up by gradually adding dynamic elements. The goal is to design a box of modular tools that can be combined as necessary, depending on the scenario. Following this approach, we have designed admission control for an open game where the set of players is dynamic, and a distributed learning algorithm for when the global objective is dynamic. 


On-device learning trains machine learning models on edge devices rather than one large model on a central server. This distributed paradigm offers a much more scalable machine learning: users can keep their private data on their devices, and training can use more data than can be uploaded to a server. In on-device learning, the loss function of one device often depends on the model parameters of others. Such interdependent training arises in multitask learning, distillation, and when the training data is correlated across devices. Interdependent training defines a game between the devices, where the actions are their model parameters and the cost functions are their training loss functions. Current on-device learning paradigms, such as federated learning, are not built to account for interdependency. With independent training, the gradient of the sum of losses with respect to any device's model parameters is different from the gradient of its own loss function. If we ignore interdependency and let each device only minimize its own loss function, the resulting stationary points can be poor Nash equilibria with a suboptimal sum of losses.


Game design is choosing the reward functions to optimize the global objective at Nash equilibrium. Any reward functions that require only local information to be computed are a valid choice. This type of design exploits the fact that with cooperative players, the reward functions do not model what selfish players want to maximize. Instead, the reward functions are part of the players' distributed algorithm. The dramatic advantage of this approach is that it yields fully distributed algorithms that require no communication between the players. We have applied game design in wireless networks and energy allocation. Recently, we have designed an automatic game design scheme based on supervised machine learning. 


In game control, the challenge is to control unknown multiagent dynamics based only on high-level feedback. This scenario arises when the multiagent system is already deployed, and we can only manage it post-design by adjusting system parameters. This is often the reality in epidemic control, transportation, energy, and supply chains. The Nash equilibrium of such large-scale systems is typically inefficient. We have shown that a manager of an unknown resource allocation game can learn to shift the Nash equilibrium to guarantee load-balancing. This is achieved by adjusting the prices of resources online based only on monitoring the total loads. More recently, we have designed a general online algorithm that learns to optimize the equilibria of an unknown game.