Ilai Bistritz - Research Statement

“You’re the U. S. Robot’s psychologist, aren’t you?”
“Robopsychologist, please.”
“Oh, are robots so different from men, mentally?”
“Worlds different.” She allowed herself a frosty smile, “Robots are essentially decent.”

- Evidence, Isaac Asimov

How can we design efficient local algorithms for multiagent learning with global performance guarantees?

Automation relegates many decision-making processes from humans to machines. In recent years, the process of automation has been powered by machine learning, which enables trained machines to achieve expert-level performance in some tasks. Many of these tasks require humans to interact (e.g., driving, delivery, construction), so automating them will result in interacting machines. Interacting machines create a networked artificial intelligence environment, where the decisions of one agent affect others and the data they learn from. As game theory predicts, this interaction can lead to globally inefficient outcomes. However, machines follow programmatic objectives and protocols that, unlike humans, are not limited by selfish interests, but by information and resources. This modern setting calls for new tools and paradigms, which are the focus of my research.

My research tackles the above question by bringing together game theory, distributed control, and multiagent learning. I study networks such as wireless networks, autonomous vehicles, the smart grid, epidemics and social networks, and edge computing. In these networks, designing scalable and efficient algorithms requires overcoming the following challenges:

Centralized algorithms are infeasible since they require collecting local information from all over the network and solving an often intractable high-dimensional problem in real-time. A centralized server that knows all user parameters and can control the agents violates user privacy and compromises the cyber-security and robustness of the system.
Communication and coordination are limited by connectivity, power constraints, delays, and noise.
The effect of agents on their peers is unknown and agents rarely even know their own local objective function, but make online data-driven decisions instead.

In such restricted environments, also known as multiplayer bandits, agents can still learn from the information that is indirectly carried by the actions of others, which affect their environment and the reward value they observe. We have shown that learning from reward values alone, with limited communication of these values between neighboring networked agents, is surprisingly enough to maximize the sum of rewards of all agents under a general interaction structure.

To achieve stronger performance guarantees, we can leverage the structure of the specific interaction of interest:

In resource allocation, the challenge is that the agents do not know how others value the same resource. We have designed distributed resource allocation algorithms with near-optimal performance and little to no communication overhead. Sometimes designing and implementing a distributed protocol from scratch is more complicated than controlling an existing one to make it efficient. We have recently shown that a manager of an unknown resource allocation game (i.e., distributed protocol) can learn to shift the Nash equilibrium of the game to guarantee load-balancing. This is achieved by adjusting the prices of resources based only on receiving the total loads on the resources as feedback. Our plug-and-play algorithm can make distributed resource allocation protocols efficient, without having to re-design them.

In collaborative learning, the challenge is that each agent knows only a small piece of the bigger picture. We have designed a distributed training algorithm that requires much less communication than Federated Learning and can include devices with different classifier architectures.

In game control, the challenge is to control unknown multiagent dynamics based only on high-level feedback. This scenario arises when the multiagent system is already deployed and we can only manage it post-design by adjusting system parameters. This is often the reality in epidemic control, transportation, energy, and supply chains. The Nash equilibrium of such large-scale systems is typically inefficient. We have shown that a manager of an unknown resource allocation game can learn to shift the Nash equilibrium to guarantee load-balancing. This is achieved by adjusting the prices of resources online based only on monitoring the total loads. More recently, we have designed a general online algorithm that learns to optimize the equilibria of an unknown game.