Past Seminars

June 9, 2021

Miroslav Krstic (UCSD)

Title: Control has met learning: Aspirational lessons from adaptive control theory

Abstract & Bio


Abstract: Adaptive control is a field thanks to whose six decades of activity many challenges in combining feedback control with online learning have been either overcome or understood to be insurmountable. As such, the results in this field are not only a veritable checklist of properties that future learning-based control methods, including those related to reinforcement learning, should strive to guarantee, but also a checklist of properties demonstrated or deemed hopeless. I will touch on learning-based feedback designs across the entire spectrum in terms of reliance on models - from the conventional model-based adaptive control, to non-model-based RL, to the entirely model-free “extremum seeking.” I focus on fundamental questions but illustrate them with examples from robotics, aquatic locomotion, road traffic, and semiconductor manufacturing.


Bio: Miroslav Krstic studies adaptive control, extremum seeking, nonlinear and stochastic control, control of PDE systems including turbulent flows, and control of delay systems. He has co-authored 16 books and about 400 journal papers on these subjects. Since 2012 he has divided his time between his research and serving as Senior Associate Vice Chancellor for Research at UC San Diego. Krstic is a recipient of the Bellman Award, SIAM Reid Prize, ASME Oldenburger Medal, and a dozen other awards. He is a foreign member of the Serbain Academy of Sciences and Arts and Fellow of IEEE, IFAC, ASME, SIAM, AAAS, IET (UK), and AIAA-AF. He is the EiC of Systems & Control Letters and area Editor in Automatica.

June 2, 2021

Andreas Krause (ETH)
Title: Safe and Efficient Exploration in Reinforcement Learning

Abstract & Bio

Abstract: At the heart of Reinforcement Learning lies the challenge of trading exploration -- collecting data for identifying better models -- and exploitation -- using the estimate to make decisions. In simulated environments (e.g., games), exploration is primarily a computational concern. In real-world settings, exploration is costly, and a potentially dangerous proposition, as it requires experimenting with actions that have unknown consequences. In this talk, I will present our work towards improving efficiency and rigorously reasoning about safety of exploration in reinforcement learning. I will discuss approaches, where we learn about unknown system dynamics through exploration, yet need to verify safety of the estimated policy. Our approaches use Bayesian inference over the objective, constraints and dynamics, and -- under some regularity conditions -- are guaranteed to be both safe and complete, i.e., converge to a natural notion of reachable optimum. I will also present recent results on harnessing uncertainty for improving efficiency of exploration in model-based deep reinforcement learning, and on meta-learning suitable probabilistic models from related tasks.

Bio: Andreas Krause is a Professor of Computer Science at ETH Zurich, where he leads the Learning & Adaptive Systems Group. He also serves as Academic Co-Director of the Swiss Data Science Center and Chair of the ETH AI Center, and co-founded the ETH spin-off LatticeFlow. Before that he was an Assistant Professor of Computer Science at Caltech. He received his Ph.D. in Computer Science from Carnegie Mellon University (2008) and his Diplom in Computer Science and Mathematics from the Technical University of Munich, Germany (2004). He is a Microsoft Research Faculty Fellow and a Kavli Frontiers Fellow of the US National Academy of Sciences. He received ERC Starting Investigator and ERC Consolidator grants, the Deutscher Mustererkennungspreis, an NSF CAREER award, the Okawa Foundation Research Grant recognizing top young researchers in telecommunications as well as the ETH Golden Owl teaching award. His research on machine learning and adaptive systems has received awards at several premier conferences and journals, including the ACM SIGKDD Test of Time award 2019 and the ICML Test of Time award 2020. Andreas Krause served as Program Co-Chair for ICML 2018, and is regularly serving as Area Chair or Senior Program Committee member for ICML, NeurIPS, AAAI and IJCAI, and as Action Editor for the Journal of Machine Learning Research.


May 12, 2021

Jonathan How (MIT)
Title: Learning-based Planning and Control: Opportunities and Challenges
Slides:
link

Abstract & Bio

Abstract: Machine learning-based techniques have recently revolutionized nearly every aspect of autonomy. In particular, deep reinforcement learning (RL) has rapidly become a powerful alternative to classical model-based approaches to decision-making, planning, and control. Despite the well-publicized successes of deep RL, its adoption in complex and/or safety-critical tasks at scale and in real-world settings is hindered by several key issues, including high sample complexity in large-scale problems, limited transferability, and lack of robustness guarantees. This talk explores our recently developed solutions that address these fundamental challenges for both single and multiagent RL. In addition, this talk highlights the complementary role that classical model-based techniques can play in synergy with data-driven methods in overcoming these issues. Real experiments with ground and aerial robots will be used to illustrate the effectiveness of the proposed techniques. The talk will conclude with an assessment of the state of the art and highlight important avenues for future research.


Bio: Jonathan P. How is the Richard C. Maclaurin Professor of Aeronautics and Astronautics at the Massachusetts Institute of Technology. He received a B.A.Sc. (aerospace) from the University of Toronto in 1987, and his S.M. and Ph.D. in Aeronautics and Astronautics from MIT in 1990 and 1993, respectively, and then studied for 1.5 years at MIT as a postdoctoral associate. Prior to joining MIT in 2000, he was an assistant professor in the Department of Aeronautics and Astronautics at Stanford University. He is a Fellow of the Institute of Electrical and Electronics Engineers (IEEE) and the American Institute of Aeronautics and Astronautics (AIAA). He was elected to the National Academy of Engineering (NAE) in 2021. Dr. How was the editor-in-chief of the IEEE Control Systems Magazine (2015-19) and is an associate editor for the AIAA Journal of Aerospace Information Systems and the IEEE Transactions on Neural Networks and Learning Systems. He was elected to the Board of Governors of the IEEE Control Systems Society (CSS) in 2019 and is a member of the IEEE CSS Technical Committee on Aerospace Control and the Technical Committee on Intelligent Control. He is the Director of the Ford-MIT Alliance and was a member of the USAF Scientific Advisory Board (SAB) from 2014-17. Dr. How’s research focuses on robust planning and learning under uncertainty with an emphasis on multiagent systems, and he was the planning and control lead for the MIT DARPA Urban Challenge team in 2007. His work has been recognized with multiple awards, including the 2020 IEEE CSS Distinguished Member Award, the 2020 AIAA Intelligent Systems Award, the 2002 Institute of Navigation Burka Award, the 2011 IFAC Automatica award for best applications paper, the 2015 AeroLion Technologies Outstanding Paper Award for Unmanned Systems, the 2015 winner of the IEEE Control Systems Society Video Clip Contest, the IROS Best Paper Award on Cognitive Robotics (2017 and 2019), the 2020 ICRA Best Paper Award in Service Robotics, and three AIAA Best Paper in Conference Awards (2011-2013). He received the Amazon Machine Learning Research Award in 2018 and 2020, and he was awarded the Air Force Commander's Public Service Award in 2017 for his contributions to the SAB.


May 5, 2021

Steve Brunton (University of Washington)

Data-Driven Dynamical Systems and Control

Abstract & Bio

Abstract: Accurate and efficient reduced-order models are essential to understand, predict, estimate, and control complex, multiscale and nonlinear dynamical systems. Machine learning constitutes a growing set of powerful techniques to extract patterns and build models from this data, complementing the existing theoretical, numerical and experimental efforts. These models should ideally be generalizable, interpretable and based on limited training data. In this talk, I will discuss several modern perspectives on data-driven control of nonlinear systems, including the dynamic mode decomposition (DMD), Koopman operator theory, and the sparse identification of nonlinear dynamics (SINDy) approach. SINDy in particular provides a general framework to discover the governing equations underlying a dynamical system simply from measurement data, leveraging advances in sparsity-promoting techniques and machine learning. The resulting models are parsimonious, balancing model complexity with descriptive ability while avoiding overfitting. This perspective, combining dynamical systems with machine learning and sparse sensing, is explored with the overarching goal of real-time closed-loop feedback control.


Bio: Dr. Steven L. Brunton is an Associate Professor of Mechanical Engineering at the University of Washington. He is also Adjunct Associate Professor of Applied Mathematics and a Data Science Fellow at the eScience Institute. Steve received the B.S. in mathematics from Caltech in 2006 and the Ph.D. in mechanical and aerospace engineering from Princeton in 2012. His research combines machine learning with dynamical systems to model and control systems in fluid dynamics, biolocomotion, optics, energy systems, and manufacturing. He is a co-author of three textbooks, received the Army and Air Force Young Investigator Program awards, the Presidential Early Career Award for Scientists and Engineers (PECASE), and he was awarded the University of Washington College of Engineering junior faculty and teaching awards.

April 28, 2021

On the Structure of Learning: What’s in the Black Box?

Patricio Antonio Vela (Georgia Tech)

Abstract & Bio

Abstract: Deep learning algorithms preceded the theory, thus there is sometimes the viewpoint that deep learning is not well understood. This viewpoint is further reinforced by current work seeking to understand the optimization landscape of the training process and why it works, as well as to understand why certain negative outcomes occur in the deployment or testing phase. However, if one ignores the how of the learning process and focuses on the what of the learning process, deep networks are less mysterious and conform to what is known about best practice for signal/function approximation and estimation. Given this understanding, it is then possible to contemplate comparably designed “shallow” networks with equivalent or more favorable properties for low to moderately dimensional input/output regression problems. The shallow networks work well for feedback control systems and exhibit single- or few-shot learning. For higher dimensional and image-like inputs, deep networks assist with feature space generation, but the decisions made still rely on shallow learning theory. In these cases, deep learning is better understood as a process for learning the composition of a feature mapping and a decision/regression function. The output representation chosen for the second step influences the learning process and can have a demonstrable impact on the performance of the learnt solution. Strong performance is therefore just as dependent on structural choices made about the network’s architecture and loss functions (the what) as the underlying learning algorithms and how they work. Understanding these properties is important for learning deployed in the closed-loop.

Bio: Patricio A. Vela is an associate professor in the School of Electrical and Computer Engineering at the Georgia Institute of Technology. Dr. Vela's research focuses on geometric perspectives to control theory and computer vision, particularly how concepts from control and dynamical systems theory can serve to improve computer vision algorithms used in the decision-loop. More recent efforts expanding his research program involve studying the role of machine learning in adaptive control and autonomous robotics, and investigating how modern advances in adaptive and optimal control theory may improve locomotion effectiveness for biologically-inspired robotics. These efforts support a broad program to understand important research challenges associated with autonomous robotic operation in uncertain environments. Dr. Vela received a B.S. and a Ph.D. from the California Institute of Technology.

April 21, 2021

Necmiye Ozay (University of Michigan)
Title: Learning models and constraints with limited data

Abstract & Bio

Abstract: System identification has a long history with several well-established methods, in particular for learning linear dynamical systems from input/output data. While the asymptotic properties of these methods are well understood as the number of data points goes to infinity or the noise level tends to zero, how well their estimates in finite data regime evolve is relatively less studied. This talk will mainly focus on our analysis of the robustness of the classical Ho-Kalman algorithm and how it translates to non-asymptotic estimation error bounds as a function of number of data samples. In the second part of the talk, I will describe a practical problem where a robot needs to learn safe behaviors from a limited number of demonstrations. We recast this problem as an inverse constraint learning problem, similar to inverse optimal control. Our experiments with several robotics problems show (local) optimality can be a very strong prior in learning from demonstrations. I will conclude the talk with some open problems and directions for future research.

Bio: Necmiye Ozay received her B.S. degree from Bogazici University, Istanbul in 2004, her M.S. degree from the Pennsylvania State University, University Park in 2006 and her Ph.D. degree from Northeastern University, Boston in 2010, all in electrical engineering. She was a postdoctoral scholar at the California Institute of Technology, Pasadena between 2010 and 2013. She joined the University of Michigan, Ann Arbor in 2013, where she is currently an associate professor of Electrical Engineering and Computer Science. She is also a member of the Michigan Robotics Institute. Dr. Ozay’s research interests include hybrid dynamical systems, control, optimization and formal methods with applications in cyber-physical systems, system identification, verification & validation, autonomy and dynamic data analysis. Her papers received several awards including a Nonlinear analysis: Hybrid Systems Prize Paper Award for years 2014-2016. She has received the 1938E Award and a Henry Russel Award from the University of Michigan for her contributions to teaching and research, and five young investigator awards, including NSF CAREER.

April 14, 2021

Na Li (Harvard)
Title: Real-time Distributed Decision Making in Networked Systems

Abstract & Bio

Abstract: Monitoring and control for complex network systems are accelerated by the recent revolutions in sensing, computation, communication, and actuation technologies that boost the development and implementation of data-driven decision making. In this talk, we will focus on real-time distributed decision-making algorithms for networked systems. The first part will be on scalable multiagent reinforcement learning algorithms and the second part will be on the model free control methods for power systems based on continuous time zeroth order optimization methods. We will show that exploiting network structure or underlying physical dynamics will facilitate the design of scalable real-time learning and control methods.

Bio: Na Li is a Gordon McKay professor in Electrical Engineering and Applied Mathematics at Harvard University. She received her Bachelor degree in Mathematics from Zhejiang University in 2007 and Ph.D. degree in Control and Dynamical systems from California Institute of Technology in 2013. She was a postdoctoral associate at Massachusetts Institute of Technology 2013-2014. Her research lies in control, learning, and optimization of networked systems, including theory development, algorithm design, and applications to real-world cyber-physical societal system. She received NSF career award (2016), AFSOR Young Investigator Award (2017), ONR Young Investigator Award(2019), Donald P. Eckman Award (2019), McDonald Mentoring Award (2020), along with some other awards.


March 17, 2021

Munther Dahleh (MIT)
Title: Data Auctions with Externalities

Abstract & Bio

The design of data markets has gained in importance as firms increasingly use predictions from machine learning models to make their operations more effective, yet need to externally acquire the necessary training data to fit such models. This is particularly true in the context of the Internet where an ever-increasing amount of user data is being collected and exchanged. The challenge in creating such a marketplace stems from the very nature of data as an asset: (i) it can be replicated at zero marginal cost; (ii) its value to a firm is inherently combinatorial (i.e. the value of a particular dataset depends on what other (potentially correlated) datasets are available); (iii) its value to a firm is dependent on which other firms get access to the same data; (iv) prediction tasks and the value of an increase in prediction accuracy vary widely between different firms, and so it is not obvious how to set prices for a collection of datasets with correlated signals; (v) finally, the authenticity and truthfulness of data is difficult to verify a priori without first applying to a prediction task.


In this work, we consider the case with N competing firms and a monopolistic data seller. We demonstrate that modeling the utility of firms solely through the increase in prediction accuracy experienced reduces the complex, combinatorial problem of allocating and pricing multiple data sets to an auction of a single digital (freely replicable) good. We address an important property of such markets that has been given limited consideration thus far, namely the externality faced by a firm when data is allocated to other, competing firms. Addressing this is likely necessary for progress towards the practical implementation of such markets. Using the modeling abstraction, we obtain forms of the welfare-maximizing and revenue-maximizing auctions for such settings. We highlight how the form of the firms’ private information – whether they know the externalities they exert on others or that others exert on them – affects the structure of the optimal mechanisms. We find that in all cases, the optimal allocation rules turn out to be single thresholds (one per firm), in which the seller allocates all information or none of it to a firm. We demonstrate how externality affects both allocation of information and revenue generated through simple examples. Finally, we discuss situations when this linear model fails.


This work is done in collaboration with Anish Agarwal, Thibaut Horel, Maryann Rui.




March 10, 2021

Benjamin Van Roy (Stanford)
Title: Simple Agent, Complex Environment: Efficient Reinforcement Learning with Agent State

Abstract & Bio

I will describe a reinforcement learning agent that, with specification only of agent state dynamics and a reward function, can operate with some degree of competence in any environment. The agent applies an optimistic version of Q-learning to update value predictions that are based on the agent’s actions and aleatoric states. We establish a regret bound demonstrating convergence to near-optimal per-period performance, where the time required is polynomial in the number of actions and aleatoric states, as well as the reward averaging time of the best policy among those for which actions depend on history only through aleatoric state. Notably, there is no further dependence on the number of environment states or averaging times associated with other policies or statistics of history.

Bio: Benjamin Van Roy is a Professor at Stanford University, where he has served on the faculty since 1998. His research focuses on the design, analysis, and application of reinforcement learning algorithms. Beyond academia, he leads a DeepMind Research team in Mountain View. He is a Fellow of INFORMS and IEEE



March 3, 2021

Dorsa Sadigh (Stanford)
Title: Walking the Boundary of Learning and Interaction

Abstract & Bio


Abstract: There have been significant advances in the field of robot learning in the past decade. However, many challenges still remain when considering how robot learning can advance interactive agents such as robots that collaborate with humans. This includes autonomous vehicles that interact with human-driven vehicles or pedestrians, service robots collaborating with their users at homes over short or long periods of time, or assistive robots helping patients with disabilities. This introduces an opportunity for developing new robot learning algorithms that can help advance interactive autonomy.

In this talk, I will discuss a formalism for human-robot interaction built upon ideas from representation learning. Specifically, I will first discuss the notion of latent strategies — low dimensional representations sufficient for capturing non-stationary interactions. I will then talk about the challenges of learning such representations when interacting with humans, and how we can develop data-efficient techniques that enable actively learning computational models of human behavior from demonstrations, preferences, or physical corrections. Finally, I will introduce an intuitive controlling paradigm that enables seamless collaboration based on learned representations, and further discuss how that can be used for further influencing humans.

Bio: Dorsa Sadigh is an assistant professor in Computer Science and Electrical Engineering at Stanford University. Her research interests lie in the intersection of robotics, learning, and control theory. Specifically, she is interested in developing algorithms for safe and adaptive human-robot interaction. Dorsa has received her doctoral degree in Electrical Engineering and Computer Sciences (EECS) from UC Berkeley in 2017, and has received her bachelor’s degree in EECS from UC Berkeley in 2012. She is awarded the NSF CAREER award, the AFOSR Young Investigator award, the IEEE TCCPS early career award, the Google Faculty Award, and the Amazon Faculty Research Award.

Feb 24, 2021

Stefano Soatto (UCLA/Amazon)

Title: Driving the Elephant through the Landscape of Deep Networks

Abstract & Bio

Abstract: The elephant in the room when interpreting the training of a deep network as an optimization problem, is the fact that optimal solutions are not good: One can easily drive the optimization residual to zero (great success!) and be left with a useless model (overfitting). So, where should we drive the solution to, if the minimum is undesirable? And how can we control the solution towards that more desirable state?

On the first question ('where'), I will first describe the “right” function we should optimize when training a deep network (Information Lagrangian), even if in practice we don't. I will then argue that, when training with stochastic gradient descent (SGD), we actually *do* optimize the Information Lagrangian, albeit unknowingly (inductive bias of SGD). (Technical aside: This result hinges on a connection between the Fisher Information in the weights of a deep network — which is computable from a finite dataset — and the Shannon Mutual Information between the trained weights and the dataset — that governs generalization but is otherwise incomputable. This is the Emergence Bound — or the magic of Deep Learning).

On the second question (‘how’), controlling SGD can be done by means of regularization. Steering the mastodon through the high-curvature bottlenecks of the high-dimensional and highly non-convex optimization residual undermines the classical (Tikhonov) view of regularization: For deep networks, regularization such as weight decay or data augmentation acts *not* by modifying the geometry of the loss landscape near convergence, but by influencing the transient dynamics in the early phases of learning. Turning the regularizer on or off after the initial transient, when the solution is still far from convergence, has little to no effect. Turning it on after the initial transient is tantamount to never regularizing.

These empirical observations point to rich unexplored territory in studying the transient dynamics of learning and how to control them: Deep Networks exhibit critical learning periods, just like biological organisms, even though they do not “age” and their “neuronal plasticity” (connectivity) is fixed at the outset. (At the same time, these observations also point to the relatively benign geometry of the loss landscape near convergence, and suggest that one may approximate deep networks locally with models that are linear-in-parameters, and suffer no loss when fine-tuning.)

Finally, since (a) every point in the loss landscape corresponds to a trained model, (b) a trained model is a (minimal sufficient) representation of a dataset, and (c) the dataset in turn specifies a learning task, I will describe how one can endow the space of learning tasks with a metric structure, so we can compute distances between learning tasks and, literally, navigate from one task to another (transfer learning) in real-world, large-scale commercial applications.

Joint work with Alessandro Achille, Giovanni Paolini, Pratik Chaudhari.

Biography: Professor Soatto received his Ph.D. in Control and Dynamical Systems from the California Institute of Technology in 1996; he joined UCLA in 2000 after being Assistant and then Associate Professor of Electrical and Biomedical Engineering at Washington University, and Research Associate in Applied Sciences at Harvard University. Between 1995 and 1998 he was also Ricercatore in the Department of Mathematics and Computer Science at the University of Udine - Italy. He received his D.Ing. degree (highest honors) from the University of Padova- Italy in 1992.

His general research interests are in Computer Vision and Nonlinear Estimation and Control Theory. In particular, he is interested in ways for computers to use sensory information (e.g. vision, sound, touch) to interact with humans and the environment.

Dr. Soatto is the recipient of the David Marr Prize (with Y. Ma, J. Kosecka and S. Sastry of U.C. Berkeley) for work on Euclidean reconstruction and reprojection up to subgroups. He also received the Siemens Prize with the Outstanding Paper Award from the IEEE Computer Society for his work on optimal structure from motion (with R. Brockett of Harvard). He received the National Science Foundation Career Award and the Okawa Foundation Grant. He is Associate Editor of the IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) and a Member of the Editorial Board of the International Journal of Computer Vision (IJCV) and Foundations and Trends in Computer Graphics and Vision.

Feb 17, 2021

Daniela Rus (MIT)
Title: Learning Risk and Social Behavior in Mixed Human-Autonomous Vehicles Systems (no recording)

Abstract & Bio

Abstract: Deployment of autonomous vehicles (AV) on public roads promises increases in efficiency and safety, and requires intelligent situation awareness. We wish to have autonomous vehicles that can learn to behave in safe and predictable ways, and are capable of evaluating risk, understanding the intent of human drivers, and adapting to different road situations. This talk describes an approach to learning and integrating risk and behavior analysis in the control of autonomous vehicles. I will introduce Social Value Orientation (SVO), which captures how an agent’s social preferences and cooperation affect interactions with other agents by quantifying the degree of selfishness or altruism. SVO can be integrated in control and decision making for AVs. I will provide recent examples of self-driving vehicles capable of adaptation.

Bio: Daniela Rus is the Andrew (1956) and Erna Viterbi Professor of Electrical Engineering and Computer Science, Director of the Computer Science and Artificial Intelligence Laboratory (CSAIL) at MIT, and Deputy Dean of Research in the Schwarzman College of Computing at MIT. She is also a visiting fellow at Mitre Corporation. Rus's research interests are in robotics and artificial intelligence. The key focus of her research is to develop the science and engineering of autonomy. Rus is a Class of 2002 MacArthur Fellow, a fellow of ACM, AAAI and IEEE, and a member of the National Academy of Engineering and of the American Academy of Arts and Sciences. She is the recipient of the Engelberger Award for robotics, IEEE RAS Pioneer award, and IJCAI John McCarthy Award. She earned her PhD in Computer Science from Cornell University.

Feb 10, 2021

Anca Dragan (UC Berkeley)

Title: Optimizing Intended Cost Functions

Abstract & Bio

Abstract: Work in controls and AI tends to focus on how to optimize a specified cost function, but costs that lead to the desired behavior consistently are not so easy to specify. Rather than optimizing specified cost, which is already hard, robots have the much harder job of optimizing intended cost. While the specified cost does not have as much information as we make our robots pretend, the good news is that humans constantly leak information about what the robot should optimize. In this talk, we will explore how to read the right amount of information from different types of human behavior -- and even the lack thereof.

Biography: Anca Dragan is an Assistant Professor in EECS at UC Berkeley, where she runs the InterACT lab. Her goal is to enable robots to work with, around, and in support of people. She works on algorithms that enable robots to a) coordinate with people in shared spaces, and b) learn what people want them to do. Anca did her PhD in the Robotics Institute at Carnegie Mellon University on legible motion planning. At Berkeley, she helped found the Berkeley AI Research Lab, is a co-PI for the Center for Human-Compatible AI, and has been honored by the Presidential Early Career Award for Scientists and Engineers (PECASE), and early career awards from Sloan, NSF, IEEE RAS, IJCAI, Okawa, and MIT TR35.

Feb 3, 2021

Marco Pavone (Stanford)

Title: Safe, Interaction-Aware Decision Making and Control for Robot Autonomy

Abstract & Bio

Abstract: In this talk I will present a decision-making and control stack for human-robot interactions by using autonomous driving as a motivating example. Specifically, I will first discuss a data-driven approach for learning multimodal interaction dynamics between robot-driven and human-driven vehicles based on recent advances in deep generative modeling. Then, I will discuss how to incorporate such a learned interaction model into a real-time, interaction-aware decision-making framework. The framework is designed to be minimally interventional; in particular, by leveraging backward reachability analysis, it ensures safety even when other cars defy the robot's expectations without unduly sacrificing performance. I will present recent results from experiments on a full-scale steer-by-wire platform, validating the framework and providing practical insights. I will conclude the talk by providing an overview of related efforts from my group on infusing safety assurances in robot autonomy stacks equipped with learning-based components, with an emphasis on adding structure within robot learning via control-theoretical and formal methods.

Bio: Dr. Marco Pavone is an Associate Professor of Aeronautics and Astronautics at Stanford University, where he is the Director of the Autonomous Systems Laboratory and Co-Director of the Center for Automotive Research at Stanford. Before joining Stanford, he was a Research Technologist within the Robotics Section at the NASA Jet Propulsion Laboratory. He received a Ph.D. degree in Aeronautics and Astronautics from the Massachusetts Institute of Technology in 2010. His main research interests are in the development of methodologies for the analysis, design, and control of autonomous systems, with an emphasis on self-driving cars, autonomous aerospace vehicles, and future mobility systems. He is a recipient of a number of awards, including a Presidential Early Career Award for Scientists and Engineers from President Barack Obama, an Office of Naval Research Young Investigator Award, a National Science Foundation Early Career (CAREER) Award, a NASA Early Career Faculty Award, and an Early-Career Spotlight Award from the Robotics Science and Systems Foundation. He was identified by the American Society for Engineering Education (ASEE) as one of America's 20 most highly promising investigators under the age of 40. He is currently serving as an Associate Editor for the IEEE Control Systems Magazine.

Jan 27, 2021

Yisong Yue (Caltech)
Title: Learning for Safety-Critical Control in Dynamical Systems

Abstract & Bio

Abstract: This talk describes ongoing research at Caltech on integrating learning into the design of safety-critical controllers for dynamical systems. To achieve control-theoretic safety guarantees while using powerful function classes such as deep neural networks, we must carefully integrate conventional control principles with learning into unified frameworks. I will focus primarily on two paradigms: integration in dynamics modeling and integration at the policy/controller design. A special emphasis will be placed on methods that both admit relevant safety guarantees and are practical to deploy.


Bio: Yisong Yue is a professor of Computing and Mathematical Sciences at the California Institute of Technology. He was previously a research scientist at Disney Research. Before that, he was a postdoctoral researcher in the Machine Learning Department and the iLab at Carnegie Mellon University. He received a Ph.D. from Cornell University and a B.S. from the University of Illinois at Urbana-Champaign.


Yisong's research interests are centered around machine learning, and in particular getting theory to work in practice. To that end, his research agenda spans both fundamental and applied pursuits. In the past, his research has been applied to information retrieval, recommender systems, text classification, learning from rich user interfaces, analyzing implicit human feedback, data-driven animation, behavior analysis, sports analytics, experiment design for science, protein engineering, program synthesis, learning-accelerated optimization, robotics, and adaptive planning & allocation problems.

Jan 20, 2021

Florian Dörfler (ETH)
Title: Data-Enabled Predictive Control

Abstract & Bio

Abstract: We consider the problem of optimal and constrained data-driven control for unknown systems. A novel data-enabled predictive control (DeePC) algorithm is presented that computes optimal and safe control policies driving the unknown system along a desired trajectory while satisfying system constraints. Using a finite number of data samples from the unknown system, our algorithm is grounded on insights from subspace identification and behavioral systems theory. In particular, we use raw unprocessed data assembled in a matrix time series for data-driven estimation and prediction. In case of deterministic linear time-invariant (LTI) systems, the DeePC algorithm is equivalent to standard Model Predictive Control (MPC). To cope with stochasticity and nonlinearity, we robustify the objective and constraints of DeePC by means of distributionally robust stochastic optimization resulting in regularized problem formulations. Finally, we relate our direct data-driven control approach to the indirect approach consisting of sequential system identification and certainty-equivalence control. We conclude that the direct approach can be derived as convex relaxation of the indirect approach, where the regularizations account for an implicit identification step. Our comparisons suggest that the direct approach is superior for control of nonlinear systems, whereas the indirect approach excels for stochastic LTI systems. All of our results are illustrated with experiments and simulations from aerial robotics, power electronics, and power systems.

Bio: Florian Dörfler is an Associate Professor at the Automatic Control Laboratory at ETH Zürich and the Associate Head of the Department of Information Technology and Electrical Engineering. He received his Ph.D. degree in Mechanical Engineering from the University of California at Santa Barbara in 2013, and a Diplom degree in Engineering Cybernetics from the University of Stuttgart in 2008. From 2013 to 2014 he was an Assistant Professor at the University of California Los Angeles. His primary research interests are centered around control, optimization, and system theory with applications in network systems, especially electric power grids. He is a recipient of the distinguished young research awards by IFAC (Manfred Thoma Medal 2020) and EUCA (European Control Award 2020). His students were winners or finalists for Best Student Paper awards at the European Control Conference (2013, 2019), the American Control Conference (2016), the Conference on Decision and Control (2020), the PES General Meeting (2020), and the PES PowerTech Conference (2017). He is furthermore a recipient of the 2010 ACC Student Best Paper Award, the 2011 O. Hugo Schuck Best Paper Award, the 2012-2014 Automatica Best Paper Award, the 2016 IEEE Circuits and Systems Guillemin-Cauer Best Paper Award, and the 2015 UCSB ME Best PhD award.

Jan 13, 2021

Sham Kakade (UW)
Title: The Provable Effectiveness of Policy Gradient Methods in Reinforcement Learning and Controls

Abstract & Bio

Abstract: Reinforcement learning is the dominant paradigm for how an agent learns to interact with the world in order to achieve some long term objectives. Here, policy gradient methods are among the most effective methods in challenging reinforcement learning problems, due to that they: are applicable to any differentiable policy parameterization; admit easy extensions to function approximation; easily incorporate structured state and action spaces; are easy to implement in a simulation based, model-free manner.

However, little is known about even their most basic theoretical convergence properties, including:

- do they converge to a globally optimal solution, say with a sufficiently rich policy class?

- how well do they cope with approximation error, say due to using a class of neural policies?

- what is their finite sample complexity?

This talk will cover a number of recent results on these basic questions and also provide the first approximation results which do have not worst case dependencies on the size of the state space. We will highlight the interplay of theory, algorithm design, and practice.

Joint work with: Alekh Agarwal, Jason Lee, Gaurav Mahajan

Bio: Sham Kakade is a professor in the Department of Computer Science and the Department of Statistics at the University of Washington and is also a senior principal researcher at Microsoft Research. His work is on the mathematical foundations of machine learning and AI. Sham's thesis helped lay the statistical foundations of reinforcement learning. With his collaborators, his additional contributions include: one of the first provably efficient policy search methods in reinforcement learning; developing the mathematical foundations for the widely used linear bandit models and the Gaussian process bandit models; the tensor and spectral methodologies for provable estimation of latent variable models; the first sharp analysis of the perturbed gradient descent algorithm, along with the design and analysis of numerous other convex and non-convex algorithms. He is the recipient of the ICML Test of Time Award, the IBM Pat Goldberg best paper award, and INFORMS Revenue Management and Pricing Prize. He has been program chair for COLT 2011.

Sham was an undergraduate at Caltech, where he studied physics and worked under the guidance of John Preskill in quantum computing. He completed his Ph.D. with Peter Dayan in computational neuroscience at the Gatsby Unit at University College London. He was a postdoc with Michael Kearns at the University of Pennsylvania.

Dec 2, 2020

Jean-Jacques Slotine (MIT)
Title: Contraction Analysis in Optimization, Learning, and Adaptive Prediction and Control

Abstract & Bio

Abstract: The human brain still largely outperforms robotic algorithms in most tasks, using computational elements 7 orders of magnitude slower than their artificial counterparts. Similarly, current large scale machine learning algorithms require millions of examples and close proximity to power plants, compared to the brain's few examples and 20W consumption. We study how modern nonlinear systems tools, such as contraction analysis and virtual dynamical systems, can yield quantifiable insights about collective computation, adaptation, and learning in large dynamical networks.

In optimization, most elementary results on gradient descent based on convexity of a time-invariant cost can be replaced by much more general results based on contraction. For instance, natural gradient descent converges to a unique equilibrium if it is contracting in some metric, with geodesic convexity of the cost corresponding to the special case of contraction in the natural metric. Semi-contraction of natural gradient in some metric implies convergence to a global minimum, and furthermore that all global minima are path-connected. Similar results apply to primal-dual optimization and game-theoretic contexts.

Stable concurrent learning and control of dynamical systems is the subject of adaptive nonlinear control. When multiple parameter choices are consistent with the data (be it for insufficient richness of the task or aggressive overparametrization), stable Riemannian adaptation laws can be designed to implicitly regularize the learned model. Thus, local geometry imposed during learning may be used to select parameter vectors for desired properties such as sparsity. The results can also be systematically applied to predictors for dynamical systems. Stable implicit sparse regularization can be exploited as well to select relevant dynamic models out of plausible physically-based candidates, as we illustrate in the contexts of Hamiltonian systems and mass-action kinetics, We also derive finite-time regret bounds for adaptive control and prediction with matched uncertainty in the stochastic setting.

Contraction-based adaptive controllers or predictors can also be used in transfer learning or sim2real contexts, where a feedback controller or predictor has been carefully learned for a nominal system, but needs to remain effective in real-time in the presence of significant but structured variations in parameters.

Finally, a key aspect of contraction tools is that they also suggest systematic mechanisms to build progressively more refined networks and novel algorithms through stable accumulation of functional building blocks and motifs.

Bio: Jean-Jacques Slotine is Professor of Mechanical Engineering and Information Sciences, Professor of Brain and Cognitive Sciences, and Director of the Nonlinear Systems Laboratory at the Massachusetts Institute of Technology. He received his Ph.D. from MIT in 1983, at age 23. After working at Bell Labs in the computer research department, he joined the MIT faculty in 1984. His research focuses on developing rigorous but practical tools for nonlinear systems analysis and control. These have included key advances and experimental demonstrations in the contexts of sliding control, adaptive nonlinear control, adaptive robotics, machine learning, and contraction analysis of nonlinear dynamical systems. Professor Slotine is the co-author of two graduate textbooks, “Robot Analysis and Control” (Asada and Slotine, Wiley, 1986), and “Applied Nonlinear Control” (Slotine and Li, Prentice-Hall, 1991) and is one of the most cited researchers in systems science and robotics. He was a member of the French National Science Council from 1997 to 2002 and of Singapore’s A*STAR SigN Advisory Board from 2007 to 2010. He currently is a member of the Scientific Advisory Board of the Italian Institute of Technology and a Distinguished Visiting Faculty at Google Brain. He is the recipient of the 2016 Oldenburger Award.

Nov 25, 2020

Claire Tomlin (UC Berkeley)
Title: Safe Learning in Robotics

Abstract & Bio

Abstract: In many applications of robot learning, guarantees that specifications are satisfied throughout the learning process are paramount. For the safety specification, we present a controller synthesis technique based on the computation of reachable sets, using optimal control and game theory. We present new methods for computing the reachable set, based on a functional approximation which has the potential to broadly alleviate its computational complexity. In the second part of the talk, we will present a toolbox of methods combining reachability with data-driven techniques, to enable performance improvement while maintaining safety. We will illustrate these “safe learning” methods on robotic platforms at Berkeley, including demonstrations of motion planning around people, and navigating in a priori unknown environments.

Bio: Claire Tomlin is a Professor of Electrical Engineering and Computer Sciences at the University of California at Berkeley, where she holds the Charles A. Desoer Chair in Engineering. Claire received the B.A.Sc. in EE from the University of Waterloo in 1992, M.Sc. in EE from Imperial College, London, in 1993, and the PhD in EECS from Berkeley in 1998. She held the positions of Assistant, Associate, and Full Professor at Stanford from 1998-2007, and in 2005 joined Berkeley. Claire works in hybrid systems and control, and integrates machine learning methods with control theoretic methods in the field of safe learning. She works in air traffic systems, unmanned air vehicle systems, and in systems biology.

Nov 18, 2020

Melanie Zeilinger (ETH)
Title: A Constrained Control Perspective on Safe Learning-Based Control

Abstract & Bio

Abstract: Various demonstrations show the potential of learning-based control paradigms. Providing safety guarantees when learning in closed-loop control systems, however, remains a central challenge for the widespread success of these promising techniques in real-life and industrial settings. Out of the different possible notions of safety, I will focus on the satisfaction of critical safety constraints in this talk, a common and intuitive form of specifying safety in many applications. We will then approach the question of safety for learning-based control from a constrained control perspective. Model predictive control (MPC) is an established control technique for addressing constraint satisfaction with demonstrated success in various industries. However, it requires a sufficiently descriptive system model, as well as a suitable formulation of the control objective to provide the desired guarantees and to be able to solve the problem via numerical optimization.

In this talk, I will discuss how MPC can provide a flexible framework for safe learning-based control, allowing to overcome some of the individual difficulties of both MPC and available reinforcement learning methods. The first part will address learning for inferring a model of the system dynamics and how to use a characterization of the residual model uncertainty to design effective but cautious controllers. The main part of the talk will then focus on a recent approach leveraging MPC as a safety filter, which provides a modular scheme for augmenting high-performance learning-based controllers with constraint satisfaction properties. I will present two formulations - a stochastic variant for linear models and a robust formulation for nonlinear models - providing closed-loop constraint satisfaction guarantees. The results will be highlighted using examples from robotics.

Bio: Melanie Zeilinger is an Assistant Professor at the Department of Mechanical and Process Engineering at ETH Zurich, Switzerland where she leads the Intelligent Control Systems group. She received the Diploma degree in engineering cybernetics from the University of Stuttgart, Germany, in 2006, and the Ph.D. degree with honors in electrical engineering from ETH Zurich, Switzerland, in 2011. From 2011 to 2012 she was a Postdoctoral Fellow with the Ecole Polytechnique Federale de Lausanne (EPFL), Switzerland. She was a Postdoctoral Researcher with the Max Planck Institute for Intelligent Systems, Tübingen, Germany until 2015 and with the Department of Electrical Engineering and Computer Sciences at the University of California at Berkeley, CA, USA, from 2012 to 2014. From 2018 to 2019 she was a professor at the University of Freiburg, Germany. Her awards include the ETH medal for her PhD thesis, a Marie-Curie IO fellowship and an SNF Professorship grant. She is one of the organizers of the new Conference on Learning for Dynamics and Control (L4DC). Her research interests include safe learning-based control, as well as distributed control and optimization, with applications to robotics and human-in-the-loop control.

Nov 11, 2020

Sasha Rakhlin (MIT)
Title: A New Approach to Contextual Bandits

Abstract & Bio

Abstract: A fundamental challenge in contextual bandits is to develop flexible, general-purpose algorithms with computational requirements no worse than classical supervised learning tasks. In this talk, we will describe a universal and optimal reduction from contextual bandits to online regression. We characterize the minimax rates for contextual bandits with general, potentially nonparametric function classes, and show that our algorithm is minimax optimal whenever the online regression method is optimal. We then turn to the case of iid data and present an adaptive method that attains fast instance-dependent rates, whenever certain disagreement-based notions of problem complexity are bounded. Time permitting, we extend these ideas to the setting of block-MDPs and provide oracle-efficient algorithms for reinforcement learning with rich observations that obtain optimal gap-dependent sample complexity.

Joint work with Dylan Foster, David Simchi-Levi, and Yunzong Xu

Bio: Alexander (Sasha) Rakhlin is an Associate Professor at MIT, with appointments in the Statistics & Data Science Center and the Department of Brain and Cognitive Sciences. Sasha is currently the Chair of the Interdisciplinary Doctoral Program in Statistics at MIT. His research is in Statistics and Machine Learning. Sasha received his bachelor’s degrees in mathematics and computer science from Cornell University, and doctoral degree in computational neuroscience from MIT. He was a postdoc at UC Berkeley EECS before joining the University of Pennsylvania, where he was an associate professor in the Department of Statistics and a co-director of the Penn Research in Machine Learning (PRiML) center. He is a recipient of the NSF CAREER award, IBM Research Best Paper award, Machine Learning Journal award, and COLT Best Paper Award.




Oct 28, 2020

Jeff Shamma (KAUST)
Title: The Long and Short of Stochastic Stability in Multi-Agent Systems

Abstract & Bio

Abstract: System models often include the presence of relatively infrequent random events, such as exploration, mutations, or errors. The concept of stochastic stability concerns how such random effects can impact long run behavior, even as they become progressively infrequent. This talk presents an overview of stochastic stability as applied to finite state Markov chain models. The talk begins with a tutorial introduction to stochastic stability and its application in a variety of settings, with a particular emphasis on multi-agent systems, namely: (1) large population signaling games, (2) coordination games, and (3) programmable self-assembly. The talk continues with a comparison and contrast of stochastic stability with the concept of an evolutionarily stable strategy (ESS), a related notion that also examines the effect of random perturbations in multi-agent population dynamics. The comparison is through the introduction of a so-called transitive-stability graph, or TS-graph, that leverages the definition of an ESS to reach stochastic stability conclusions. The talk concludes with a discussion of how short and medium run behavior can distinguish dynamics that exhibit identical long run stochastic stability properties.

Bio: Jeff S. Shamma is a Professor of Electrical and Computer Engineering at the King Abdullah University of Science and Technology (KAUST). At the end of the year, he will join the University of Illinois at Urbana-Champaign as the Department Head of Industrial and Enterprise Systems Engineering (ISE) and Jerry S. Dobrovolny Chair in ISE. Jeff received a Ph.D. in systems science and engineering from MIT in 1988. He is a Fellow of IEEE and IFAC; a recipient of the IFAC High Impact Paper Award, AACC Donald P. Eckman Award, and NSF Young Investigator Award; and a past Distinguished Lecturer of the IEEE Control Systems Society. Jeff is currently serving as the Editor-in-Chief for the IEEE Transactions on Control of Network Systems.



Oct 21, 2020

Tamer Başar (UIUC)

Title: Policy Optimization for Linear Optimal Control with Guarantees of Robustness


Abstract & Bio

Policy optimization (PO) is a key ingredient of modern reinforcement learning (RL), and can be used for the efficient design of optimal controllers. For control design, certain constraints are generally enforced on the policies to be implemented, such as stability, robustness, and/or safety concerns on the closed-loop system. Hence, PO entails, by its nature, a constrained optimization in most cases, which is also nonconvex, and analysis of its global convergence is generally very challenging. Further, another element that compounds the challenge is that some of the constraints that are safety-critical, such as closed-loop stability or the H-infinity (H) norm constraint that guarantees the system robustness, can be difficult to enforce on the controller while being learned as the PO methods proceed. We have recently overcome this difficulty for a special class of such problems, which I will discuss in this presentation.

Specifically, I will introduce the problem of PO for H2 linear control with a guarantee of robustness according to the Hcriterion, for both continuous- and discrete-time linear systems. I will argue, with justification, that despite the nonconvexity of the problem, PO methods can enjoy the global convergence property. More importantly, I will show that the iterates of two specific PO methods (namely, natural policy gradient and Gauss-Newton) automatically preserve the H norm (i.e., the robustness) during iterations, thus enjoying what we refer to as “implicit regularization” property. Furthermore, under certain conditions, convergence to the globally optimal policies features globally sub-linear and locally super-linear rates. Due to the inherent equivalence of this optimal robust control model to risk-sensitive linear control and linear quadratic (LQ) dynamic games, these results also apply as a byproduct to these settings as well, with however some adjustments. The latter, in particular, entails PO with two agents, and the order in which the updates are done becomes a challenging issue, which I will also discuss. The talk will conclude with some informative simulations, and brief discussion of extensions to the model-free framework and associated sample complexity analyses.

(Based on joint work with Kaiqing Zhang and Bin Hu, UIUC)

Bio: Tamer Başar has been with the University of Illinois at Urbana-Champaign since 1981, where he holds the academic positions of Swanlund Endowed Chair; Center for Advanced Study (CAS) Professor of Electrical and Computer Engineering; Professor, Coordinated Science Laboratory; Professor, Information Trust Institute; and Affiliate Professor, Mechanical Science and Engineering. He is also the Director of the Center for Advanced Study. At Illinois, he has also served as Interim Dean of Engineering and Interim Director of the Beckman Institute for Advanced Science and Technology. He is a member of the US National Academy of Engineering; Fellow of IEEE, IFAC, and SIAM; a past president of the IEEE Control Systems Society (CSS), the founding president of the International Society of Dynamic Games (ISDG), and a past president of the American Automatic Control Council (AACC). He has received several awards and recognitions over the years, including the highest awards of IEEE CSS, IFAC, AACC, and ISDG, the IEEE Control Systems Technical Field Award, and a number of international honorary doctorates and professorships, most recently an honorary doctorate from KTH, Sweden. He has over 900 publications in systems, control, communications, optimization, networks, and dynamic games, including books on non-cooperative dynamic game theory, robust control, network security, wireless and communication networks, and stochastic networks. He was Editor-in-Chief of the IFAC Journal Automatica between 2004 and 2014, and is currently editor of several book series. His current research interests include stochastic teams, games, and networks; multi-agent systems and learning; data-driven distributed optimization; epidemics modeling and control over networks; security and trust; energy systems; and cyber-physical systems.

Oct 7, 2020

Babak Hassibi (Caltech)

Title: Regret-Optimal Control

Abstract & Bio

Abstract: A major challenge in control is that current actions have an effect on future performance, whereas the future (in terms of measurements, disturbance signals, etc.) is unknown. To date, two major approaches to deal with future uncertainty have been studied by control theorists: stochastic control (such as LQG) where the statistical properties of the signals are assumed to be known and average future performance is optimized, and robust control (such as H-infinity) where the worst-case future performance is optimized. Stochastic control is known to be sensitive to deviations from the assumed statistical model, and robust control is known to often be too conservative because it safeguards against the worst-case. Motivated by learning theory, as a criterion for controller design, we propose to use regret, defined as the difference between the performance of a causal controller (that has only access to past and current disturbances) and that of a clairvoyant controller (that has also access to future disturbances). The resulting controller has the interpretation of guaranteeing the smallest possible regret compared to the best non-causal controller, no matter what the disturbances are. In the full-information LQR setting, we show that the regret-optimal control problem can be reduced to the classical Nehari problem. We obtain explicit formulas for the optimal regret and for the regret-optimal controller, which turns out to be the sum of the classical $H_2$ state-feedback law and an $n$-th order controller (where $n$ is the state dimension of the plant). Simulations over a range of plants demonstrates that the regret-optimal controller interpolates nicely between the $H_2$ and the $H_\infty$ optimal controllers, and generally has $H_2$ and $H_\infty$ costs that are simultaneously close to their optimal values. The regret-optimal controller thus presents itself as a viable option for control system design. We will also discuss ramifications and generalization of the results.

Bio: Babak Hassibi is the inaugural Mose and Lillian S. Bohn Professor of Electrical Engineering at the California Institute of Technology, where he has been since 2001. From 2011 to 2016, he was the Gordon M Binder/Amgen Professor of Electrical Engineering, and during 2008-2015, he was Executive Officer of Electrical Engineering, as well as Associate Director of Information Science and Technology. Prior to Caltech, he was a Member of the Technical Staff in the Mathematical Sciences Research Center at Bell Laboratories, Murray Hill, NJ. He obtained his PhD degree from Stanford University in 1996 and his BS degree from the University of Tehran in 1989. His research interests span various aspects of information theory, communications, signal processing, control and machine learning. He is an ISI highly cited author in Computer Science and, among other awards, is the recipient of the US Presidential Early Career Award for Scientists and Engineers (PECASE) and the David and Lucille Packard Fellowship in Science and Engineering.

Sep 30, 2020

Elad Hazan (Princeton University)
Title: The Non-Stochastic Control Problem

Abstract

Linear dynamical systems are a continuous subclass of reinforcement learning models that are widely used in robotics, finance, engineering, and meteorology. Classical control, since the work of Kalman, has focused on dynamics with Gaussian i.i.d. noise, quadratic loss functions and, in terms of provably efficient algorithms, known systems and observed state. We'll discuss how to apply new machine learning methods which relax all of the above: efficient control with adversarial noise, general loss functions, unknown systems, and partial observation.

Based on a series of works with Naman Agarwal, Nataly Brukhim, Karan Singh, Sham Kakade, Max Simchowitz, Cyril Zhang, Paula Gradu, John Hallman, Xinyi Chen

Bio

Elad Hazan is a professor of computer science at Princeton University. His research focuses on the design and analysis of algorithms for basic problems in machine learning and optimization. Amongst his contributions are the co-development of the AdaGrad optimization algorithm, and the first sublinear-time algorithms for convex optimization. He is the recipient of the Bell Labs prize, (twice) the IBM Goldberg best paper award in 2012 and 2008, a European Research Council grant, a Marie Curie fellowship and Google Research Award (twice). He served on the steering committee of the Association for Computational Learning and has been program chair for COLT 2015. In 2017 he co-founded In8 inc. focusing on efficient optimization and control, acquired by Google in 2018. He is the co-founder and director of Google AI Princeton.