Coordination-free pricing for resource allocation over networks: models and regret guarantees
ABSTRACT:
The pervasive integration of edge intelligence has provided opportunities for proactive demand management in critical infrastructure systems such as power or transportation networks. Existing approaches often involve soliciting customer preferences or navigating complex negotiation protocols to coordinate customer behavior before real-time operations, making them challenging to implement in the real world. This has ignited growing interest in data-driven methods that gradually and autonomously learn to manage demand, all without the need for direct communication with users. But such data-driven actions may lead to high operational costs and/or violation of the networks’ physical safety constraints, as they are uncertain about the users’ response. This talk will explore several recent methods developed in our group, which guarantee that data-driven control actions taken to change the users’ demand in safety-critical systems are network safe regardless of model uncertainty, and discuss their accompanying theoretical performance bounds on the growth of regret.
Network Control Across Scales and Applications to Traffic Management
This talk will present new analysis and control methods for dynamical systems across scales, with applications to traffic management at the vehicle, road link, and network levels to enhance efficient use of capacity and to reduce congestion. We will start with a hierarchical control approach and illustrate it on vehicle and road link control examples. We will also highlight experiments on autonomous docking of model marine vessels and platooning of self-driving cars in urban traffic. Moving on to the network level, we will explore how theories of population games and evolutionary dynamics can be applied to traffic routing. Population games model the strategy distribution among large groups of agents, while evolutionary dynamics explore how the distribution evolves as agents learn better strategies. We will discuss how these concepts can be integrated with realistic traffic flow models and leveraged to inform decisions by transportation authorities.
Learning equilibria in games with bandit feedback
A significant challenge in managing large-scale engineering systems, such as energy and transportation networks, lies in enabling autonomous decision-making of interacting agents. Game theory offers a framework for modeling and analyzing this class of problems. In many practical applications, each player only has partial information about the cost functions and actions of others. Therefore, a decentralized learning approach is essential to devise optimal strategies for each player. My talk will focus on recent advances in decentralized learning in static and in Markov games under bandit feedback. It highlights challenges compared to single agent learning and presents algorithms with provable convergence. The first part will focus on learning in continuous action static games. The second part will explore Markov games, presenting our learning approaches for zero-sum Markov games and coarse-correlated equilibria in general-sum Markov games. I will conclude with presenting few open research directions.
Resilient Average Consensus via Distributed Detection and Multi-hop Communication
In this talk, we discuss the problem of average consensus in multi-agent systems in a resilient manner when some of the agents are subject to failures or attacks. The objective is for the non-faulty/normal agents to reach the exact average of their initial values even in the presence of erroneous effects from malicious agents. We study this resilient average consensus problem in a fairly general setting for multi-agent networks with directed topologies. The problem poses three challenges: (i) The average of the normal agents must be maintained within the system over time; (ii) as soon as a neighbor is found to be faulty, its effect throughout the iteration to that point must be removed; and (iii) when malicious agents are neighboring and collaborate with each other to mislead the normal ones, this further raises the security risk. To address these issues, we introduce a distributed iterative algorithm for resilient average consensus. It consists of two stages at each iteration: Detection and averaging. For detection, the distributed algorithm can detect malicious agents based on only the information from direct in-neighbors. For the averaging, we extend the applicability of the so-called push-sum ratio algorithm for the normal agents to remove the effects brought by malicious agents thus far after their detection. An important feature is to use multi-hop communication to ensure sufficient information exchanges among the agents. We show how this can enhance the security level of the network despite the chance of the information relayed by malicious agents to be contaminated.
Localized spectral representations for reinforcement learning in networked MDPs
Networked Markov Decision Processes (MDPs) pose a significant challenge to efficient learning due to the exponential growth of the overall state-action space with the number of agents. Recent works have attempted to address this by leveraging the locally decaying structure of the dynamics to develop localized policies that depend only on a truncated portion of the state-action graph. However, these approaches are limited to finite state-action spaces, leaving the more realistic and challenging scenario of large or continuous state-action spaces in networked MDPs as an open problem. In this work, we leverage recent advances in spectral representation of Q-value functions, which have enabled efficient learning for single-agent MDPs, to derive a networked version of a localized spectral representation for the Q-function of each agent, utilizing the exponential decay structure of network dynamics. Building on these local spectral representations, we design efficient algorithms for networked MDPS whose state-action space can be large-size or even continuous. We illustrate the efficacy of our approach with a series of representative experiments.
System theory tools for optimization and learning in complex and distributed systems
In this talk I will address control and learning scenarios requiring the online solution of (possibly distributed) optimization problems. I will first highlight some key challenges arising in addressing these problems as, e.g., online and closed-loop nature of the learning strategy, structure of the required policy, large-scale nature, local communication requirements. Then I will show how system theory tools can be used to design and analyze solution strategies and gain insights on the algorithm performance properties. Energy and robotic systems are key sources of concrete scenarios in which these challenges arise. Applications to these key domains will be shown along with future perspectives.
Inverse Equilibrium Problems and Applications toTransportation Networks
Equilibrium modeling is common in a variety of fields such as game theory, transportation science, and systems biology. The inputs for these models, however, are often difficult to estimate, while their outputs, i.e., the equilibria they are meant to describe, are often directly observable. By combining ideas from inverse optimization with the theory of variational inequalities, we develop an efficient, data-driven technique for estimating the parameters of these models from observed equilibria. A distinguishing feature of our approach is that it supports both parametric and nonparametric estimation by leveraging ideas from statistical learning. We apply this general framework to transportation networks. Using real traffic data from the Boston area and New York City, we estimate origin-destination flow demand matrices and the per-road cost (congestion) functions drivers implicitly use for route selection. Given demand information, estimating the congestion functions can be formulated as a convex optimization problem. However, jointly estimating demand and congestion functions can be formulated as a bi-level optimization problem which is non-convex. We will present two approaches for solving large instances of the joint problem. We will also discuss extensions to multi-class transportation networks that can account for multiple vehicle types. Given the informationestimated from observed equilibria (i.e., traffic flow data), one can formulate and solve a system-optimum problem to identify socially optimal flows for the transportation network. The ratio of total latency under a user-optimal policy versus a system-optimal policy is the so-called Price-of-Anarchy, quantifying the efficiency loss of selfish actions compared to socially optimal ones. We find that the price-of-anarchy can be quite substantial, suggesting that there is scope for control actions to steer the equilibrium to a socially optimal one. We will discuss what some of these actions may be and how to prioritize interventions.
Thompson Sampling for Channel Selection in Networked Estimation and Control
The use of machine learning techniques for control has gained increasing attention in recent years. Learning-based estimation and control holds the promise of enabling the solution of problems that are difficult or even intractable using traditional control design techniques. Despite significant progress, several issues, e.g., in relation to stability guarantees, robustness and computational cost, remain. This talk presents some of our recent work on networked control systems with uncertainties and illustrates how posterior sampling techniques can be used for their design. We focus on a basic architecture where sensor measurements and control signals are transmitted over lossy communication channels that introduce random packet dropouts. At any time instant, one out of several available channels can be chosen for transmission. Since channel dropout probabilities are unknown, finding the best channel requires learning from transmission outcomes. We study a scenario where both learning of the channel dropout probabilities and control are carried out simultaneously. Coupling between learning dynamics and control system dynamics raises challenges in relation to stability and performance. To facilitate fast learning we propose to select channels using Bayesian posterior sampling, also called Thompson Sampling. This talk elucidates conditions that guarantee that the resulting system will be stochastically stable and characterises performance in terms of the control regret.