Reward inference (learning a reward model from human preferences) is a critical intermediate step in Preference-based Reinforcement Learning (PbRL), such as Reinforcement Learning from Human Feedback (RLHF) for fine-tuning Large Language Models (LLMs). In practice, reward inference faces fundamental challenges such as distribution shift, reward model overfitting, and problem misspecification. An alternative approach is direct policy optimization without reward inference, such as Direct Preference Optimization (DPO), which provides a much simpler pipeline but only works for the bandit setting or deterministic MDPs. This talk introduces new PbRL algorithms for general MDPs, general preference models beyond the Bradley-Terry model, and unknown link functions. The key idea is to use a policy perturbation approach based on stochastic zeroth-order optimization. We will discuss the convergence rates of these algorithms in terms of the number of policy gradient iterations, the number of trajectory samples, and the number of preference queries per iteration.
Bio: Lei Ying is currently a Professor at the Electrical Engineering and Computer Science Department of the University of Michigan, Ann Arbor, an IEEE Fellow, and an Editor-at-Large for the IEEE/ACM Transactions on Networking. His research is broadly in the interplay of complex stochastic systems and big data, including reinforcement learning, large-scale communication/computing systems for big-data processing, private data marketplaces, and large-scale graph mining.
There are many important engineering problems such as new material design and active flow control in which one desires to compute an optimal policy in a Markov decision process with general state and action spaces (potentially infinite dimensional). Traditional approach to solve such problems is to identify finite state and action representations, potentially approximating the MDP and solve the simpler finite MDP problem using proximal policy optimization algorithm.
In this talk, we take a different approach to deriving policy gradient algorithm for general MDPs. Markov decision processes (MDPs) is viewed as an optimization of an objective function over certain linear operators over general function spaces. Using the well-established perturbation theory of linear operators, this viewpoint allows one to identify derivatives of the objective function as a function of the linear operators. This leads to generalization of many well-known results in reinforcement learning to cases with generate state and action spaces. Prior results of this type were only established in the finite-state finite-action MDP settings and in settings with certain linear function approximations. The framework also leads to new low-complexity PPO-type reinforcement learning algorithms for general state and action space MDPs and a class of PPO algorithms for finite MDPs.
Some open problems will be discussed at the end of the talk.
Bio: Abhishek Gupta is an Associate Professor in the Department of Electrical and Computer Engineering at The Ohio State University. He earned his Ph.D. and M.S. in Aerospace Engineering (2014) and an M.S. in Applied Mathematics (2012) from the University of Illinois at Urbana-Champaign, following his B.Tech. in Aerospace Engineering from IIT Bombay (2009). His research focuses on stochastic control, probability, and game theory, with applications to transportation markets, electricity markets, and the cybersecurity of control systems. In 2019, he received the Lumley Research Award from The Ohio State University.
Abstract: Motivated by applications in online marketplaces such as ride-hailing platforms and payment channel networks, we study a single-server queue with state-dependent arrival control. The service operator dynamically chooses the arrival rate as a function of the current queue length and receives a reward determined by the induced rate, capturing objectives such as throughput, revenue, or social welfare. The goal is to design control policies that simultaneously achieve high long-run operating reward and low congestion, measured by the expected steady-state queue length. We adopt a regret-based framework relative to an optimal benchmark and characterize the efficiency--reward trade-off under an ε-optimal reward constraint. Our results reveal a sharp dichotomy between small-market and large-market regimes. In small markets, including state-independent policies, any admissible control incurs poor efficiency, with the expected queue length growing on the order of 1/ε. In contrast, in large markets, state-dependent policies can achieve substantially better performance. When the reward function exhibits sufficient curvature, the optimal queue length scales as O(1/ε^0.5) ; otherwise, it scales as O(log 1/ε). For each regime, we establish universal lower bounds on the achievable efficiency and construct simple state-dependent policies that attain these bounds. Our results provide a non-asymptotic heavy-traffic characterization for queues with dynamic arrivals and offer structural insights into the design of efficient pricing and admission control policies.
Bio: Sushil Varma is an assistant professor at the Industrial and Operations Engineering Department of the University of Michigan, Ann Arbor. His research interests include queueing theory and revenue management with applications in online marketplaces, load balancing, and stochastic processing/matching networks. Sushil's thesis has received the GT Sigma Xi Best Ph.D. Thesis Award, ACM SIGMETRICS Doctoral Dissertation Award, and the INFORMS TSL Dissertation Award Competition (finalist).
We study incentive design when multiple principals simultaneously design mechanisms for their respective teams in environments with strategic spillovers. The interdependence of principals’ mechanism choices creates a generalized game in which each principal’s set of incentive-compatible mechanisms depends on the choices of others. Following a classic example by Myerson (1982), such games may lack equilibrium due to discontinuities in the correspondence of incentive-compatible mechanisms. We establish general conditions for equilibrium existence by introducing a novel approach that involves tracking
both the outcome distributions along the truthful-obedient path and the sets of outcome distributions achievable through unilateral deviations, thereby providing a foundation for analyzing a wide range of multi-principal mechanism design with team production and agency problems.
Control strategies that rely on shaping the information available to a decision-maker have become popular recently in robotics, engineering, computer science and information economics. The canonical model for such strategic information transmission is known as Bayesian Persuasion and was introduced in the latter field in 2016 by Kamenica and Gentzkow. As the name suggests, this setting assumes that the decision-maker whose information set is influenced acts fully rationally, and according to Bayes rule, upon receiving messages from the sender/controller/influencer. This assumption not only delineates the kind of situations captured by the model but is also, as it turns out, central to its tractability. This talk presents extensions of this basic framework to situations of interest to cyber-socio-physical systems, notably in the presence of large populations of decision-makers, non-fully Bayesian agents, discontinuous utilities and combinations thereof. Most of this work is joint with Olivier Massicot and Vijeth Hebbar.
Bio: Cedric Langbort is a Professor of Aerospace Engineering at the University of Illinois at Urbana Champaign (UIUC), where he is also affiliated with the Decision & Control Group at the Coordinated Science Lab (CSL), and the department of Electrical and Computer Engineering (0 % appointment). He works on applications of control, game, and optimization theory to a variety of fields and co- founded and co-directed the Center for People & Infrastructures at CSL. His and his advisees’ work have garnered multiple recognitions such as a NSF CAREER Award, a Siebel Energy Institute Research Award, an IEEE CDC Best Student Paper Award and a NDSEG fellowship.
Socio-technical systems, ranging from online communities and organizational networks to robotic teams and multi-agent AI, are shaped by two intertwined processes: how agents coordinate within groups and how groups themselves form, dissolve, and reorganize. These processes operate at distinct time scales. At a tactical level, agents repeatedly interact and aggregate information through consensus dynamics; at a strategic level, they make game-theoretic exit–join decisions based on incentives, marginal contributions, and switching frictions. This talk develops a unified framework that couples fast within-coalition consensus with slower coalition reconfiguration under Aumann–Drèze value allocation, so that coalition surplus is generated endogenously from collective information aggregation. Coalition formation thus induces a sequence of state-dependent cooperative games embedded in a noncooperative deviation process, yielding a hybrid fast–slow switching game with endogenous payoffs.
The analysis reveals a structural insight central to socio-technical design: strategic instability can induce tactical unanimity. When barriers to movement across groups are low, repeated reconfiguration creates temporal connectivity across the population, driving global consensus even if coalition structures remain fluid. Conversely, high switching frictions stabilize structure but may sustain segregation or persistent polarization. We provide fixed-point characterizations of joint tactical–strategic equilibria, establish existence results, and derive conditions under which unanimity, fragmentation, or polarization emerge. By integrating cooperative game theory, noncooperative stability, and networked consensus dynamics, the framework offers a principled foundation for analyzing incentives, structure, and collective behavior in complex socio-technical systems.
Bio: Quanyan Zhu received B. Eng. in Honors Electrical Engineering from McGill University in 2006, M. A. Sc. from the University of Toronto in 2008, and Ph.D. from the University of Illinois at Urbana-Champaign (UIUC) in 2013. After stints at Princeton University, he is currently an associate professor at the Department of Electrical and Computer Engineering, New York University (NYU). He is an affiliated faculty member of the Center for Urban Science and Progress (CUSP) and Center for Cyber Security (CCS) at NYU. His research interests include cyber and physical systems, multi-agent systems, and cybersecurity and resilience. He currently serves as the technical committee chair on security and privacy for the IEEE Control Systems Society.
Speakers: TBD
Presenters: TBD
To enable a smart and autonomous system to be cognizant, taskable, and adaptive in exploring an unknown and unstructured environment, robotic decision-making relies on learning a parameterized knowledge representation. However, one fundamental challenge in deriving the parameterized representation is the undesirable trade-off between computation efficiency and model fidelity. This talk addresses this challenge in the context of underwater vehicle target tracking in unknown marine environments. To improve fidelity of the reduced-order model, we develop a learning method to generate a non-Markovian neural-symbolic abstracted model on system dynamics. Such abstraction guarantees to improve the modeling accuracy. Further, taking advantage of the abstracted model, we develop a hierarchical planner to translate human specified missions directly to a set of executable actions with low computation cost. The proposed hierarchical planner applies to both single-agent and multi-agent scenarios, enabling cognizant and adaptive decision-making in complex environments.
Bio: Mengxue Hou received the PhD degree from Electrical and Computer Engineering at Georgia Institute of Technology in 2022, and B.S. degree from Electrical Engineering at Shanghai Jiao Tong University in 2016. She was the Lillian Gilbreth Postdoctoral Fellow at College of Engineering, Purdue University from 2022 to 2023. Since 2023, she is an Assistant Professor at Electrical Engineering, University of Notre Dame. She serves as editor for Journal of Intelligent & Robotic Systems, and Associate Editor of IEEE ACC, CDC and ICRA. Her research interests include robotics, AI, control theory, and shared autonomy.
High-dimensional motor coordination tasks require humans to learn how to map high-dimensional motor commands to low-dimensional task outcomes using a combination of proprioceptive feedback and exteroceptive feedback (e.g., visual cursor motion). Because these mappings are redundant and unknown, performance alone is a poor proxy for latent sensorimotor skill, motivating models and interventions that explicitly reason about learning dynamics. In this talk, I will present a dynamical and control-theoretic model of human sensorimotor learning in such settings. Learning is modeled as the evolution of a latent skill state over low-dimensional motor synergies, making learning tractable despite the high dimensionality of the motor system. This framework explains canonical trade-offs: exploration versus exploitation, speed versus accuracy, and flexibility versus performance, and is validated in body-machine interface experiments in which participants learn to control a cursor through high-dimensional hand kinematics. Building on this model, I will show how skill-aware intervention can be designed. Stochastic nonlinear model predictive control (SNMPC) is used to generate adaptive target-sequencing curricula that optimize long-horizon learning, while robotic nudge design is formulated as a partially observable decision problem to shape the learner’s latent skill state by regulating exploration rather than merely correcting instantaneous error.
Bio: Vaibhav Srivastava received his B.Tech. degree in mechanical engineering from the Indian Institute of Technology Bombay, Mumbai, India, in 2007. He earned an M.S. (2011) and Ph.D. (2012) in mechanical engineering, along with an M.A. in statistics (2012), from the University of California, Santa Barbara. Dr. Srivastava is currently an Associate Professor of Electrical and Computer Engineering at Michigan State University, with additional affiliations in Mechanical Engineering and the Cognitive Science Program. From 2013 to 2016, he served as a Lecturer and Associate Research Scholar in the Department of Mechanical and Aerospace Engineering at Princeton University. He is a senior member of the IEEE. His research focuses on Cyber-Physical Human Systems, with an emphasis on mixed human-robot teams, networked multi-agent systems, neuroscience, and autonomous aerial and ground vehicles.
Abstract: TBD
This talk is motivated by the possibility of a small number of automated vehicles (AVs) that may soon be present on our roadways, and the impacts they will have on traffic flow. This automation may take the form of fully autonomous vehicles without human intervention (SAE Level 5) or, as is already the case in many modern vehicles, may take the form of driver assist features such as adaptive cruise control (ACC) or other SAE Level 1 and 2 features. Regardless of the extent of automation, the introduction of such vehicles has the potential to substantially alter emergent properties of the flow while also providing new opportunities for control of the traffic flow. Understanding these effects, and how to control traffic to overcome these effects requires accurate plant dynamics. Additionally, AVs and automation features may initially be quite costly and restricted only to a small number of road users, thus restricting the benefit to only those who can afford AVs. Instead, growing research has suggested that AVs can be used to improve traffic flow conditions for all road users, even those who do not use AV technology.
Bio: Raphael Stern is an Associate Professor in the Department of Civil, Environmental, and Geo- Engineering at the University of Minnesota. Prior to joining UMN, Raphael was a postdoctoral researcher in the Department of Informatics at the Technical University of Munich and a visiting scholar at Vanderbilt University's Institute for Software Integrate Systems. Raphael received his BS, MS, and PhD all in Civil Engineering from the University of Illinois at Urbana-Champaign. Raphael is a recipient of the National Science Foundation's CAREER Award.
We investigate the robustness of multi-agent learning in strongly monotone games with bandit feedback. While previous research has developed learning algorithms that achieve last-iterate convergence to the unique Nash equilibrium (NE) at a polynomial rate, we demonstrate that all such algorithms are vulnerable to adversaries capable of poisoning even a single agent's utility observations. Specifically, we propose an attacking strategy such that for any given time horizon, the adversary can mislead any multi-agent learning algorithm to converge to a point other than the unique NE with a corruption budget that grows sublinearly in time. To further understand the inherent robustness of these algorithms, we characterize the fundamental trade-off between convergence speed and the maximum tolerable total utility corruptions for two example algorithms, including the state-of-the-art one. Our theoretical and empirical results reveal an intrinsic efficiency-robustness trade-off: the faster an algorithm converges, the more vulnerable it becomes to utility poisoning attacks.
Bio: Ermin Wei is an Associate Professor at the Electrical and Computer Engineering Department and Industrial Engineering and Management Sciences Department of Northwestern University with a courtesy appointment in Computer Science. Wei's research interests include distributed optimization methods, convex optimization and analysis, smart grid, communication systems and energy networks and market economic analysis. Her team won the 2nd place in the GO-competition Challenge 1, an electricity grid optimization competition organized by Department of Energy.
We study the problem of learning stable matchings with unknown preferences in a decentralized and an uncoordinated setting. Here, decentralized means that players make decisions independently, without guidance from a central platform, while uncoordinated indicates that players do not synchronize their actions through communication or shared schedules. We begin by formulating the problem as a game under known preferences, where the set of pure Nash equilibria (NE) coincides with the set of stable matchings, and any mixed NE can be rounded to a stable matching. We then show that in hierarchical markets, where agents prefer partners at higher levels, applying the Exponential Weights (EXP) learning algorithm for the stable matching game achieves logarithmic regret in a fully decentralized and uncoordinated manner. Furthermore, we prove that EXP converges locally and exponentially fast to a stable matching in general markets. In addition, we propose another decentralized and uncoordinated learning algorithm that converges globally to a stable matching with arbitrarily high probability. Finally, we identify stronger feedback conditions under which the market can be driven more rapidly toward an approximate stable matching. Our game-theoretic framework bridges the discrete problem of learning stable matchings with the continuous problem of learning NE in games with continuous strategy spaces.
Bio: Rasoul Etesami is an Associate Professor in the Department of Industrial and Systems Engineering at the University of Illinois Urbana-Champaign (UIUC), where he is also affiliated with the Department of Electrical and Computer Engineering and the Coordinated Science Laboratory. From 2016 to 2017, he was a Postdoctoral Research Fellow in the Department of Electrical Engineering at Princeton University and WINLAB. He received his Ph.D. in Electrical and Computer Engineering from UIUC. His research interests include the analysis of complex decision-making systems using tools from control theory, game theory, optimization, and learning theory. He received the Best CSL Ph.D. Thesis Award at UIUC in 2016, the NSF CAREER Award in 2020, the U.S. Air Force Young Investigator Award in 2023, the James Franklin Sharp Outstanding Teaching Award in 2023, as well as several other best teaching and research awards.
Speakers: TBD
Presenters: TBD
Dinner will be served overlooking the Purdue football field at the beautiful Buchanan Club
Prof. Tamer Basar will share some words.
A grand engineering challenge is to develop intelligent teams of distributed, heterogeneous autonomous robots that rapidly enable situational awareness in mixed indoor/outdoor, cluttered, and highly unknown environments. Such teams can transform emergency response, defense, inspection and maintenance, especially in off-the-grid operations where the robots must rely on robot-to-robot communication only. Achieving this requires an interdisciplinary approach that integrates physical intelligence (adaptive actuation, sensing, computing, communication) and AI (resource-aware perception, learning, reasoning, acting), across the single- and multi-agent levels. In this talk, I will present my lab’s physical AI efforts to enable scalability and reliability for control and distributed planning, via (i) novel mophable quadrotors that are agile, disturbance-resilient, and maneuverable to rapidly and reliably enable situational awareness even in cluttered spaces; (ii) one-shot, self-supervised learning algorithms that enable the robots to adapt on-the-fly to unknown disturbances and dynamics that compromise control accuracy and planning; and (iii) distributed optimization algorithms that enable the robots to scale planning, despite the low data rates of robot-to-robot communications. Key in our approach is to treat the body of the systems —structure of the robot, in the single-agent level, and topology of the mesh network, in the multi-aget level— as actively adaptable to optimize performance, upon integration with resource- and performance-aware optimization algorithms for adaptive planning and control. We build on tools of bandit learning, regret optimization (convex and submodular), adaptive nonlinear MPC, and submodular optimization. I will present evaluations in quadrotor hardware, and in large-scale simulations (>40 robots) that account for realistic data-rate limitations. I will also discuss open challenges.
Bio: Vasileios Tzoumas is an assistant professor at the University of Michigan, Ann Arbor (postdoc @ MIT; Ph.D. @ U of Pennsylvania). His research is on co-adaptive physical and artificial intelligence for scalable and reliable cyber-physical systems in resource-constrained, unstructured, and contested environments, such as robots and networked systems in defense, disaster response, and smart cities. He is a recipient of an NSF CAREER Award on networked embodied intelligence, an Army Research Office YIP award on resource-aware distributed optimization and bandit learning, the Best Paper Award in Robot Vision at the 2020 IEEE International Conference on Robotics and Automation (ICRA), an Honorable Mention from the 2020 IEEE Robotics and Automation Letters (RAL), and was a Best Student Paper Finalist Award at the 2017 IEEE Conference in Decision and Control (CDC) for a paper on robust and adaptive resource allocation and multi-agent coordination.
Social choice plays two fundamental roles in the development and use of AI systems. First, the construction of AI models relies on aggregating heterogeneous human preferences to produce outcomes that best reflect what most people find desirable. Second, in decision support settings, users increasingly rely on multiple AI agents that provide heterogeneous recommendations, which must then be aggregated into a single course of action. Classical impossibility results in social choice theory highlight inherent tensions in designing such aggregation rules, making this task particularly challenging. We show that game-theoretic approaches and approximate solutions can help overcome these difficulties, allowing us to retain key normative properties while enabling practical and robust designs for AI-based applications.
Bio: Thanh Nguyen is the Lewis B. Cullman Rising Star Professor of Quantitative Methods at the Mitch Daniels School of Business, Purdue University. His research focuses on market design and decision sciences. He has published in leading journals including the American Economic Review, Journal of Political Economy, Management Science, and Operations Research. His work has been supported by grants from the National Science Foundation (NSF) and DARPA.
During the early stages of the COVID-19 pandemic, we observed that demographic and socioeconomic differences influenced disease outcomes. This was first observed in case counts and infection fatality rates, and later in vaccination rates for the first dose. These disparities are further exacerbated by the choices regarding adherence to public health recommendations. In this talk, I will discuss three projects in which we analyzed and quantified the role of demographic and socioeconomic factors in disease transmission and how survey data can inform game-theoretical models. In the final part, I will show how economic status and perceived authority influence vaccination and social distancing decisions. Our findings reveal differences among susceptible and infected individuals based on their own objectives when it comes to following guidelines. These results highlight the importance of data-driven mathematical models for understanding complex human responses to public health policies and mitigation strategies.
Bio: Pamela Martinez is an Assistant Professor of Microbiology and Statistics at the University of Illinois Urbana-Champaign. Prior to that, she was a Postdoctoral Fellow at the Center for Communicable Disease Dynamics at the Harvard School of Public Health and earned her Ph.D. in Ecology and Evolution from the University of Chicago. Her research focuses on applying mathematical and computational tools to study the population dynamics of human infectious diseases. She is particularly interested in understanding how environmental drivers, pathogen diversity, and host socio-demographic factors influence disease transmission and control.
Ensemble control deals with the problem of using a finite-dimensional control input to simultaneously steer an infinite number of dynamical systems. It originated from quantum spin systems, and finds applications in neuroscience, social science, and engineered systems such as robotics. The main challenge of controlling an ensemble system is rooted in the requirement that the control input be generated irrespective of the individual system. Over the last score, controllability of an ensemble system has been addressed extensively and understood to a great extent. The problem of feedback stabilizing such system remains, however, largely open. In this talk, we address this open problem. Specifically, we consider discrete ensembles of linear, scalar control systems with single-inputs. Assuming that all the individual systems are unstable, we investigate whether there exist linear feedback control laws that can asymptotically stabilize the ensemble system. We provide necessary/sufficient conditions for feasibility of pole placement in the left half plane and for feedback stabilizability of the ensemble systems.
Bio: Xudong Chen received the B.S. degree in Electronic Engineering from Tsinghua University, Beijing, China, in 2009, and the Ph.D. degree in Electrical Engineering from Harvard University, Cambridge, Massachusetts, in 2014. He is currently an Associate Professor in the Department of Electrical and Systems Engineering at Washington University in St. Louis. He is an awardee of the 2020 Air Force Young Investigator Program, a recipient of the 2021 NSF Career Award, the recipient of the 2021 Donald P. Eckman Award, and the recipient of the 2023 A.V. Balakrishnan Early Career Award. His current research interests are in the area of control theory, stochastic processes, optimization, network science, and game theory.
Grab a lunchbox and hang out or head home!