Abstract: TBD
There are many important engineering problems such as new material design and active flow control in which one desires to compute an optimal policy in a Markov decision process with general state and action spaces (potentially infinite dimensional). Traditional approach to solve such problems is to identify finite state and action representations, potentially approximating the MDP and solve the simpler finite MDP problem using proximal policy optimization algorithm.
In this talk, we take a different approach to deriving policy gradient algorithm for general MDPs. Markov decision processes (MDPs) is viewed as an optimization of an objective function over certain linear operators over general function spaces. Using the well-established perturbation theory of linear operators, this viewpoint allows one to identify derivatives of the objective function as a function of the linear operators. This leads to generalization of many well-known results in reinforcement learning to cases with generate state and action spaces. Prior results of this type were only established in the finite-state finite-action MDP settings and in settings with certain linear function approximations. The framework also leads to new low-complexity PPO-type reinforcement learning algorithms for general state and action space MDPs and a class of PPO algorithms for finite MDPs.
Some open problems will be discussed at the end of the talk.
Bio: Abhishek Gupta is an Associate Professor in the Department of Electrical and Computer Engineering at The Ohio State University. He earned his Ph.D. and M.S. in Aerospace Engineering (2014) and an M.S. in Applied Mathematics (2012) from the University of Illinois at Urbana-Champaign, following his B.Tech. in Aerospace Engineering from IIT Bombay (2009). His research focuses on stochastic control, probability, and game theory, with applications to transportation markets, electricity markets, and the cybersecurity of control systems. In 2019, he received the Lumley Research Award from The Ohio State University.
Abstract: Motivated by applications in online marketplaces such as ride-hailing platforms and payment channel networks, we study a single-server queue with state-dependent arrival control. The service operator dynamically chooses the arrival rate as a function of the current queue length and receives a reward determined by the induced rate, capturing objectives such as throughput, revenue, or social welfare. The goal is to design control policies that simultaneously achieve high long-run operating reward and low congestion, measured by the expected steady-state queue length. We adopt a regret-based framework relative to an optimal benchmark and characterize the efficiency--reward trade-off under an ε-optimal reward constraint. Our results reveal a sharp dichotomy between small-market and large-market regimes. In small markets, including state-independent policies, any admissible control incurs poor efficiency, with the expected queue length growing on the order of 1/ε. In contrast, in large markets, state-dependent policies can achieve substantially better performance. When the reward function exhibits sufficient curvature, the optimal queue length scales as O(1/ε^0.5) ; otherwise, it scales as O(log 1/ε). For each regime, we establish universal lower bounds on the achievable efficiency and construct simple state-dependent policies that attain these bounds. Our results provide a non-asymptotic heavy-traffic characterization for queues with dynamic arrivals and offer structural insights into the design of efficient pricing and admission control policies.
Bio: Sushil Varma is an assistant professor at the Industrial and Operations Engineering Department of the University of Michigan, Ann Arbor. His research interests include queueing theory and revenue management with applications in online marketplaces, load balancing, and stochastic processing/matching networks. Sushil's thesis has received the GT Sigma Xi Best Ph.D. Thesis Award, ACM SIGMETRICS Doctoral Dissertation Award, and the INFORMS TSL Dissertation Award Competition (finalist).
We study incentive design when multiple principals simultaneously design mechanisms for their respective teams in environments with strategic spillovers. The interdependence of principals’ mechanism choices creates a generalized game in which each principal’s set of incentive-compatible mechanisms depends on the choices of others. Following a classic example by Myerson (1982), such games may lack equilibrium due to discontinuities in the correspondence of incentive-compatible mechanisms. We establish general conditions for equilibrium existence by introducing a novel approach that involves tracking
both the outcome distributions along the truthful-obedient path and the sets of outcome distributions achievable through unilateral deviations, thereby providing a foundation for analyzing a wide range of multi-principal mechanism design with team production and agency problems.
Abstract: TBD
Abstract: TBD
Speakers: TBD
Presenters: TBD
To enable a smart and autonomous system to be cognizant, taskable, and adaptive in exploring an unknown and unstructured environment, robotic decision-making relies on learning a parameterized knowledge representation. However, one fundamental challenge in deriving the parameterized representation is the undesirable trade-off between computation efficiency and model fidelity. This talk addresses this challenge in the context of underwater vehicle target tracking in unknown marine environments. To improve fidelity of the reduced-order model, we develop a learning method to generate a non-Markovian neural-symbolic abstracted model on system dynamics. Such abstraction guarantees to improve the modeling accuracy. Further, taking advantage of the abstracted model, we develop a hierarchical planner to translate human specified missions directly to a set of executable actions with low computation cost. The proposed hierarchical planner applies to both single-agent and multi-agent scenarios, enabling cognizant and adaptive decision-making in complex environments.
Bio: Mengxue Hou received the PhD degree from Electrical and Computer Engineering at Georgia Institute of Technology in 2022, and B.S. degree from Electrical Engineering at Shanghai Jiao Tong University in 2016. She was the Lillian Gilbreth Postdoctoral Fellow at College of Engineering, Purdue University from 2022 to 2023. Since 2023, she is an Assistant Professor at Electrical Engineering, University of Notre Dame. She serves as editor for Journal of Intelligent & Robotic Systems, and Associate Editor of IEEE ACC, CDC and ICRA. Her research interests include robotics, AI, control theory, and shared autonomy.
High-dimensional motor coordination tasks require humans to learn how to map high-dimensional motor commands to low-dimensional task outcomes using a combination of proprioceptive feedback and exteroceptive feedback (e.g., visual cursor motion). Because these mappings are redundant and unknown, performance alone is a poor proxy for latent sensorimotor skill, motivating models and interventions that explicitly reason about learning dynamics. In this talk, I will present a dynamical and control-theoretic model of human sensorimotor learning in such settings. Learning is modeled as the evolution of a latent skill state over low-dimensional motor synergies, making learning tractable despite the high dimensionality of the motor system. This framework explains canonical trade-offs: exploration versus exploitation, speed versus accuracy, and flexibility versus performance, and is validated in body-machine interface experiments in which participants learn to control a cursor through high-dimensional hand kinematics. Building on this model, I will show how skill-aware intervention can be designed. Stochastic nonlinear model predictive control (SNMPC) is used to generate adaptive target-sequencing curricula that optimize long-horizon learning, while robotic nudge design is formulated as a partially observable decision problem to shape the learner’s latent skill state by regulating exploration rather than merely correcting instantaneous error.
Bio: Vaibhav Srivastava received his B.Tech. degree in mechanical engineering from the Indian Institute of Technology Bombay, Mumbai, India, in 2007. He earned an M.S. (2011) and Ph.D. (2012) in mechanical engineering, along with an M.A. in statistics (2012), from the University of California, Santa Barbara. Dr. Srivastava is currently an Associate Professor of Electrical and Computer Engineering at Michigan State University, with additional affiliations in Mechanical Engineering and the Cognitive Science Program. From 2013 to 2016, he served as a Lecturer and Associate Research Scholar in the Department of Mechanical and Aerospace Engineering at Princeton University. He is a senior member of the IEEE. His research focuses on Cyber-Physical Human Systems, with an emphasis on mixed human-robot teams, networked multi-agent systems, neuroscience, and autonomous aerial and ground vehicles.
Abstract: TBD
We investigate the robustness of multi-agent learning in strongly monotone games with bandit feedback. While previous research has developed learning algorithms that achieve last-iterate convergence to the unique Nash equilibrium (NE) at a polynomial rate, we demonstrate that all such algorithms are vulnerable to adversaries capable of poisoning even a single agent's utility observations. Specifically, we propose an attacking strategy such that for any given time horizon, the adversary can mislead any multi-agent learning algorithm to converge to a point other than the unique NE with a corruption budget that grows sublinearly in time. To further understand the inherent robustness of these algorithms, we characterize the fundamental trade-off between convergence speed and the maximum tolerable total utility corruptions for two example algorithms, including the state-of-the-art one. Our theoretical and empirical results reveal an intrinsic efficiency-robustness trade-off: the faster an algorithm converges, the more vulnerable it becomes to utility poisoning attacks.
Bio: Ermin Wei is an Associate Professor at the Electrical and Computer Engineering Department and Industrial Engineering and Management Sciences Department of Northwestern University with a courtesy appointment in Computer Science. Wei's research interests include distributed optimization methods, convex optimization and analysis, smart grid, communication systems and energy networks and market economic analysis. Her team won the 2nd place in the GO-competition Challenge 1, an electricity grid optimization competition organized by Department of Energy.
Abstract: TBD
Speakers: TBD
Presenters: TBD
Dinner will be served overlooking the Purdue football field at the beautiful Buchanan Club
Prof. Tamer Basar will share some words.
A grand engineering challenge is to develop intelligent teams of distributed, heterogeneous autonomous robots that rapidly enable situational awareness in mixed indoor/outdoor, cluttered, and highly unknown environments. Such teams can transform emergency response, defense, inspection and maintenance, especially in off-the-grid operations where the robots must rely on robot-to-robot communication only. Achieving this requires an interdisciplinary approach that integrates physical intelligence (adaptive actuation, sensing, computing, communication) and AI (resource-aware perception, learning, reasoning, acting), across the single- and multi-agent levels. In this talk, I will present my lab’s physical AI efforts to enable scalability and reliability for control and distributed planning, via (i) novel mophable quadrotors that are agile, disturbance-resilient, and maneuverable to rapidly and reliably enable situational awareness even in cluttered spaces; (ii) one-shot, self-supervised learning algorithms that enable the robots to adapt on-the-fly to unknown disturbances and dynamics that compromise control accuracy and planning; and (iii) distributed optimization algorithms that enable the robots to scale planning, despite the low data rates of robot-to-robot communications. Key in our approach is to treat the body of the systems —structure of the robot, in the single-agent level, and topology of the mesh network, in the multi-aget level— as actively adaptable to optimize performance, upon integration with resource- and performance-aware optimization algorithms for adaptive planning and control. We build on tools of bandit learning, regret optimization (convex and submodular), adaptive nonlinear MPC, and submodular optimization. I will present evaluations in quadrotor hardware, and in large-scale simulations (>40 robots) that account for realistic data-rate limitations. I will also discuss open challenges.
Bio: Vasileios Tzoumas is an assistant professor at the University of Michigan, Ann Arbor (postdoc @ MIT; Ph.D. @ U of Pennsylvania). His research is on co-adaptive physical and artificial intelligence for scalable and reliable cyber-physical systems in resource-constrained, unstructured, and contested environments, such as robots and networked systems in defense, disaster response, and smart cities. He is a recipient of an NSF CAREER Award on networked embodied intelligence, an Army Research Office YIP award on resource-aware distributed optimization and bandit learning, the Best Paper Award in Robot Vision at the 2020 IEEE International Conference on Robotics and Automation (ICRA), an Honorable Mention from the 2020 IEEE Robotics and Automation Letters (RAL), and was a Best Student Paper Finalist Award at the 2017 IEEE Conference in Decision and Control (CDC) for a paper on robust and adaptive resource allocation and multi-agent coordination.
Social choice plays two fundamental roles in the development and use of AI systems. First, the construction of AI models relies on aggregating heterogeneous human preferences to produce outcomes that best reflect what most people find desirable. Second, in decision support settings, users increasingly rely on multiple AI agents that provide heterogeneous recommendations, which must then be aggregated into a single course of action. Classical impossibility results in social choice theory highlight inherent tensions in designing such aggregation rules, making this task particularly challenging. We show that game-theoretic approaches and approximate solutions can help overcome these difficulties, allowing us to retain key normative properties while enabling practical and robust designs for AI-based applications.
Abstract: TBD
Ensemble control deals with the problem of using a finite-dimensional control input to simultaneously steer an infinite number of dynamical systems. It originated from quantum spin systems, and finds applications in neuroscience, social science, and engineered systems such as robotics. The main challenge of controlling an ensemble system is rooted in the requirement that the control input be generated irrespective of the individual system. Over the last score, controllability of an ensemble system has been addressed extensively and understood to a great extent. The problem of feedback stabilizing such system remains, however, largely open. In this talk, we address this open problem. Specifically, we consider discrete ensembles of linear, scalar control systems with single-inputs. Assuming that all the individual systems are unstable, we investigate whether there exist linear feedback control laws that can asymptotically stabilize the ensemble system. We provide necessary/sufficient conditions for feasibility of pole placement in the left half plane and for feedback stabilizability of the ensemble systems.
Bio: Xudong Chen received the B.S. degree in Electronic Engineering from Tsinghua University, Beijing, China, in 2009, and the Ph.D. degree in Electrical Engineering from Harvard University, Cambridge, Massachusetts, in 2014. He is currently an Associate Professor in the Department of Electrical and Systems Engineering at Washington University in St. Louis. He is an awardee of the 2020 Air Force Young Investigator Program, a recipient of the 2021 NSF Career Award, the recipient of the 2021 Donald P. Eckman Award, and the recipient of the 2023 A.V. Balakrishnan Early Career Award. His current research interests are in the area of control theory, stochastic processes, optimization, network science, and game theory.
Grab a lunchbox and hang out or head home!