Past Seminars

Yang Gao (Tsinghua University)

Talk Title: The future of large embodied model

Talk Time: 30th of May at 16:00h CEST time (07:00h California time, 22:00h Beijing time)

Host: Shangding Gu

Abstract: Embodied intelligence is one of the important milestones of artificial intelligence. In this report, I will introduce several important data sources in the embodied intelligence large model, and how to train the embodied large model from these data. Specifically, it includes Internet video data, pre-trained vision language large models, imitation learning data and reinforcement learning data. Specifically, I will introduce the following work: (1) General Flow, which learns world priors from human video data (2) ViLa and CoPa, that extract world priors from pre-trained large vision language models (3) Foundation RL, which employees large models to assist real world RL, to greatly improve the model success rate.

Bio: Yang Gao is an assistant professor at IIIS, Tsinghua University. Before that, he got his Ph.D. degree from UC Berkeley, advised by Prof. Trevor Darrell. He also spent a year at Berkeley for his postdoc, working with Trevor Darrell and Pieter Abbeel. He is mainly interested in computer vision and robotic learning. Before that, he graduated from the computer science department at Tsinghua University, where he worked with Prof. Jun Zhu on Bayesian inference. He has interned in Google Research on natural language processing from 2011 to 2012, with Dr. Edward Y. Chang and Dr. Fangtao Li and in Waymo autonomous driving team during the summer of 2016. He also worked on autonomous driving problem at Intel research during the summer of 2018, with Dr. Vladlen Koltun.

Dr. Laixi Shi (Caltech)

Talk Title: The Curious Price of Distributional Robustness in Reinforcement Learning: Towards provable optimal sample efficiency

Talk Time: 06:00 PM-07:00 PM, 26.10.2023 (CET Time)

Host: Shangding Gu

Abstract: Reinforcement learning (RL), which strives to learn desirable sequential decisions based on trial-and-error interactions with an unknown environment, has achieved remarkable success recently in a variety of domains including games and large language model alignment. While standard RL has been heavily investigated recently, a policy learned in an ideal, nominal environment might fail catastrophically when the deployed environment is subject to small changes in task objectives or adversarial perturbations, especially in high-stake applications such as robotics and clinical trials.

This talk concerns the central issues of model robustness and sample efficiency in reinforcement learning (RL) to reduce the sim-to-real gap in practice. We adopt the framework of distributionally robust Markov decision processes (RMDPs), aimed at learning a policy that optimizes the worst-case performance when the deployed environment falls within a prescribed uncertainty set around the nominal MDP. Despite recent efforts, the sample complexity of RMDPs remained mostly unsettled regardless of the uncertainty set in use. It was unclear if distributional robustness bears any statistical consequences when benchmarked against standard RL. Somewhat surprisingly, our results uncover that RMDPs are not necessarily easier or harder to learn than standard MDPs. The statistical consequence incurred by the robustness requirement depends heavily on the size and shape of the uncertainty set. In addition, we break down the sample barrier of robust RL in offline setting by providing the first provable near-optimal algorithm for offline robust RL that can learn under simultaneous model uncertainty and limited historical datasets.

Bio: Laixi Shi is a postdoctoral fellow in the Dept. of Computing and Mathematical Sciences at the California Institute of Technology (Caltech). She received her Ph.D. from CMU in August 2023. She completed her B.S. in Electronic Engineering at Tsinghua University from 2014 to 2018. She has also interned at Google Research Brain Team and Mitsubishi Electric Research Laboratories. Her research interests include reinforcement learning (RL), non-convex optimization, high-dimensional statistical estimation, and robotics, ranging from theory to applications. Her current research focuses on 1) theoretical works: designing provable sample-efficient algorithms for value-based RL, offline RL, and robust RL problems, resorting to optimization and statistics; 2) practical works: reinforcement learning algorithms (DRL) on different large-scale problems such as robotics, Atari games, web navigation and etc.

Oswin So (MIT)

Talk Title: Solving Stabilize-Avoid Optimal Control via Epigraph Form and Deep Reinforcement Learning

Talk Time: 04:00 PM-05:00 PM, 02.08.2023 (CET Time)

Host: Shangding Gu

Abstract: Tasks for autonomous robotic systems can be abstracted as asking for stabilization to a desired region while upholding safety specifications. Solving this multi-objective problem is still challenging when the dynamics are nonlinear and high-dimensional, as traditional methods do not scale well and are often limited to specific problem structures. In this talk, I will present our recent work that approaches the stabilize-avoid problem using tools from control, optimization, and deep reinforcement learning. We solve the stabilize-avoid problem via the solution of an infinite-horizon constrained optimal control problem (OCP). We propose a novel approach to transform the constrained OCP into epigraph form, resulting in a two-stage optimization problem that optimizes over the policy in the inner problem and over an auxiliary variable in the outer problem. Our method admits better stability during training and is not restricted to specific problem structures compared to previous approaches. We validate our approach on different benchmark tasks ranging from low dimensional toy examples to a fixed-wing jet with a 17-dimensional state space. Simulation results show that our approach consistently yields controllers that match or exceed the safety of existing methods while yielding ten-fold increases in stability performance from larger regions of attraction.

Bio: Oswin So is a first year PhD student in the Department of Aeronautics and Astronautics at MIT working under the supervision of Chuchu Fan. He received his B.S. from Georgia Tech. His research interests lie in the analysis of reinforcement learning from the perspective of control.

Dr. Dongsheng Ding (University of Pennsylvania)

Talk Title: Provable constrained policy optimization in reinforcement learning

Talk Time: 04:00 PM-05:00 PM, 28.06.2023 (CET Time)

Host: Shangding Gu

Abstract: Constrained policy optimization is a prominent methodology in modern reinforcement learning (RL) for discovering optimal policies under specific constraints. Its applications span diverse domains, including robot navigation, autonomous driving, video compression, power control, and cancer screening. Despite its empirical success, the convergence properties of associated RL algorithms remain inadequately understood. To address this gap, in this talk I will present our work on establishing constrained policy search algorithms with provable convergence guarantees. Our proposed approach employs a primal-dual method, leveraging natural policy gradient ascent to search for the policy as a primal variable, while simultaneously utilizing dual sub-gradient descent to adjust the price associated with violating constraints. I will cover the convergence rate analysis of this primal-dual method, and its generalization to problems featuring large state/action spaces through function approximation. Moreover, I will delve into the sample-based implementation of our methods and their finite-sample complexity guarantees. Furthermore, if time allows, I will discuss our recent advancements in enhancing the convergence from the traditional average sense to the more desirable last-iterate sense.

Bio: Dongsheng Ding is a Postdoctoral Researcher in the Department of Electrical and Systems Engineering at the University of Pennsylvania. He received his Ph.D. in Electrical Engineering from the University of Southern California, his M.S. from the University of Minnesota Twin Cities, and his B.E. and M.E. degrees from Zhejiang University. His research interests lie in Optimization, Control, Reinforcement Learning, and their interfaces.

Puze Liu (TU Darmstadt)

Talk Title: Safe and Reliable Robot Reinforcement Learning in Dynamic Environments

Talk Time: 04:00 PM-05:00 PM, 28.04.2023 (CET Time)

Host: Alap Kshirsagar

Abstract: Dynamic tasks require robots to react and adapt quickly to environmental changes. Unfortunately, classic robotics solutions are often not flexible enough to handle these problems. Instead, Reinforcement Learning allows robots to generalize skills among environment variations and cope with unforeseen tasks. In dynamic environments, the gap between simulation and reality often enlarges drastically, which leads to the failure of real-world deployments. Online learning and exploring that takes into account safety issues is essential for reinforcement learning on real robots. To interact safely in the real world, we develop a new safe exploration approach by constructing a safe action space that tangents to the constraint manifold. We will illustrate how to utilize the safe action space for reinfocement learning in various robotic tasks such as manipulation, navigation, and human-robot interaction.

Bio: Puze Liu is a fourth-year PhD student in the Department of Computer Science at the Technical University of Darmstadt, supervised by Prof. Jan Peters. Prior to his PhD, Puze Liu obtained his Bachelor's degree from Tongji University and his Master's degree from the Technical University of Berlin. His research focuses on the interdisciplinary aspects of robotics and machine learning. puze Liu seeks to mitigate the challenges of deploying machine learning algorithms in robotics. His current research focuses on developing safe and reliable learning robots in dynamic environments. More information is available at http://puzeliu.github.io/

Zuxin Liu (CMU)

Talk Title: Towards robust, efficient, and safe reinforcement learning

Talk Time: 04:00 PM-05:00 PM, 23.03.2023 (CET Time)

Host: Yuhao Ding

Abstract: Safe reinforcement learning (RL) trains a policy to maximize the task reward while satisfying safety constraints. While most prior works focus on the performance optimality, we find that the optimal policies solved by many safe RL algorithms are not robust and safe against carefully designed observational perturbations. We formally analyze the unique properties of safe RL and design effective adversarial attackers, suggesting the vulnerability of deep safe RL agents. One interesting and counter-intuitive finding is that the maximum reward attack is strong, as it can both induce unsafe behaviors and make the attack stealthy by maintaining the task reward. We further propose efficient adversarial training methods for safe RL and show their effectiveness in improving safety under attacks via robotic experiments. We believe that robustness should be an important aspect in the safe RL area, because a vulnerable policy under attacks cannot be regarded as truly safe in the physical world. We will also briefly cover other safe RL topics such as how to efficiently learn a safe policy from off-policy or offline data.

Bio: Zuxin Liu is a Ph.D. student at Carnegie Mellon University. He obtained his Bachelor’s degree with honor at Beihang University, China. He spent summers at DJI, Nuro, and Amazon previously. His research interests lie in reinforcement learning, imitation learning, and their robotic applications. Particularly, he is interested in how to safely, robustly, and efficiently deploy learning-based decision-making algorithms to real-world safety-critical applications. More information is available at https://www.zuxin.me

Yuhao Ding (UC Berkeley)

Talk Title: Safe Reinforcement Learning in the Presence of Non-stationarity: Theory and Algorithms

Talk Time: 17:00-18:00, 21.02.2023 (CET)

Host: Shangding Gu

Abstract: Despite the successes of reinforcement learning (RL) in simulation-based systems such as video games and Go, the existing RL techniques are not yet applicable or are too risky to employ in real-world autonomous systems. Those applications often require safety assurance, and the underlying environment may undergo changes and be nonstationary. While both aspects have been tackled separately in the literature to some limited extent, there remains a substantial gap when these issues arise simultaneously, imposing challenges for the deployment of concurrent methods in real-world systems. To overcome these challenges and realize the full potential of RL for adaptability and performance gains, we develop a new mathematical foundation and a set of computational tools for the design of safe RL algorithms that can be deployed in environments that undergo changes. Along this line, we will present the following three objectives: (1) non-stationary constrained Markov decision processes, (2) non-stationary risk-sensitive RL, and (3) meta-safe RL.

Bio: Yuhao Ding is currently a fifth-year Ph.D. student at UC Berkeley - Operations Research department. His research interests include reinforcement learning, control theory, optimization, and statistical learning. During his Ph.D., he focuses on non-stationary sequential decision-making problems such as time-varying optimization, the global convergence of policy gradient methods, and non-stationary reinforcement learning.

Shahin Atakishiyev (University of Alberta)

Talk Title: Development of explainable reinforcement learning approaches for safe and interpretable autonomous driving

Talk Time: 16:00-17:00 (CET), 17.01.2023

Host: Shangding Gu

Abstract: Reinforcement learning (RL) as a trial-and-error learning method is a powerful artificial intelligence (AI) technique for sequential decision-making problems. In the context of autonomous driving, RL can be used to train a self-driving car to make real-time decisions in a dynamic environment, such as staying in a lane, detecting traffic lights, and avoiding collisions, to name a few. One challenge with RL-based autonomous driving is that understanding an agent's decisions is hard; such decisions are not interpretable in many cases. As autonomous driving is a safety-critical application of AI, incorrect actions may lead to high-stakes consequences. Hence, it is crucial to understand and trust the decisions made by a self-driving vehicle. In this context, explainable reinforcement learning (XRL) is an emerging research area aiming to make the agent's decision-making process more interpretable and trustworthy. In this seminar, we propose potential XRL approaches to interpret the decision-making process of autonomous cars. In addition, we show that XRL not only improves the transparency of real-time decisions but also provides an opportunity to enhance the safety of an intelligent driving system.

Bio: Shahin Atakishiyev is a PhD student in Computing Science at the University of Alberta. He previously obtained an MSc in Software Engineering and Intelligent Systems from the University of Alberta, Canada, and a BSc in Computer Engineering from Qafqaz University, Azerbaijan. Shahin is developing explainable reinforcement learning (XRL) approaches for autonomous driving in his doctoral research. He has presented the preliminary findings of his study in AAAI-2022 and was awarded a prize for the presented research article. His current research is funded by the Ministry of Science and Education of the Republic of Azerbaijan, Huawei Technologies Canada, Co., Ltd, and the Department of Computing Science at the University of Alberta. More information about his works can be acquired here: https://webdocs.cs.ualberta.ca/~atakishi/

Dr.Hosein Hasanbeig (University of Oxford)

Talk Title: Safe and Certiﬁed Reinforcement Learning with Logical Constraints

Talk Time: 15:00-16:00 (CET), 9.12.2022

Host: Shangding Gu

Abstract: Reinforcement Learning (RL) is a widely employed machine learning architecture that has been applied to a variety of decision-making problems. However, RL has experienced limited success beyond rigidly controlled and carefully constrained applications, and successful employment of RL in safety-critical scenarios is yet to be achieved. A principal reason for this limitation is the lack of a systematic formal approach to specify requirements, and to provide guarantees with respect to these requirements, during and after learning. We address these issues by proposing a general method that leverages the success of RL in learning high-performance controllers, while developing the required assumptions and theories for the satisfaction of formal requirements and guiding the learning process within safe conﬁgurations. Given the expressive power of symbolic temporal logic in formulating control speciﬁcations, we propose the ﬁrst model-free RL scheme to synthesize policies for unknown black-box Markov Decision Processes (MDPs) under temporal logic. We convert the logical speciﬁcation into a ﬁnite-state machine that provides a systematic and formal reward shaping mechanism. We further discuss the assumptions under which RL is guaranteed to synthesize control policies that satisfy the logical speciﬁcation with maximum probability possible. We show that through logic inference, the speciﬁcation can be further refined to ensure the transparency and alignment of the learning process with human interests. The performance of the proposed method is evaluated via a set of numerical examples and benchmarks, that includes the well-known Atari game Montezuma’s Revenge.

Bio: Dr. Hosein Hasanbeig's bio: He completed his Ph.D. in 2020 in the Computer Science Department of the University of Oxford, under the supervision of Alessandro Abate and Daniel Kroening. He had the honour to serve as a lecturer in St Catherine's College at the University of Oxford, teaching Computer-aided Formal Verification from 2018 to 2022. Before Oxford, He was a research assistant in the Systems Control Lab at the University of Toronto, where he received his M.Sc. in 2016. His research focus is on the design and analysis of safe, interpretable, and explainable machine learning algorithms in decision-making problems. Most of his work is at the intersection of reinforcement learning, automatic control, formal methods, and game theory.