Safe Reinforcement Learning Online Seminar

 Get involved: We welcome the researchers and students who are interested in safe RL to join us! To receive relevant seminar information in time, please click the following link to register.

Purpose

Reinforcement learning (RL) algorithms that satisfy safety constraints are crucial for real-world applications. The development of safe RL algorithms has received substantial attention in recent years. However, several challenges remain unsolved. For example, how to ensure safety while deploying RL methods in real-world applications. We are organizing this Safe RL Seminar to discuss the recent advances and challenges in safe RL with researchers from academia and industry.

Current Seminar

Dr. Laixi Shi (Caltech)

Talk Title:  The Curious Price of Distributional Robustness in Reinforcement Learning: Towards provable optimal sample efficiency

Talk Time: 06:00 PM-07:00 PM, 26.10.2023 (CET Time)

Host: Shangding Gu

Abstract: Reinforcement learning (RL), which strives to learn desirable sequential decisions based on trial-and-error interactions with an unknown environment, has achieved remarkable success recently in a variety of domains including games and large language model alignment. While standard RL has been heavily investigated recently, a policy learned in an ideal, nominal environment might fail catastrophically when the deployed environment is subject to small changes in task objectives or adversarial perturbations, especially in high-stake applications such as robotics and clinical trials.

This talk concerns the central issues of model robustness and sample efficiency in reinforcement learning (RL) to reduce the sim-to-real gap in practice. We adopt the framework of distributionally robust Markov decision processes (RMDPs), aimed at learning a policy that optimizes the worst-case performance when the deployed environment falls within a prescribed uncertainty set around the nominal MDP. Despite recent efforts, the sample complexity of RMDPs remained mostly unsettled regardless of the uncertainty set in use. It was unclear if distributional robustness bears any statistical consequences when benchmarked against standard RL. Somewhat surprisingly, our results uncover that RMDPs are not necessarily easier or harder to learn than standard MDPs. The statistical consequence incurred by the robustness requirement depends heavily on the size and shape of the uncertainty set. In addition, we break down the sample barrier of robust RL in offline setting by providing the first provable near-optimal algorithm for offline robust RL that can learn under simultaneous model uncertainty and limited historical datasets.

Bio: Laixi Shi is a postdoctoral fellow in the Dept. of Computing and Mathematical Sciences at the California Institute of Technology (Caltech). She received her Ph.D. from CMU in August 2023. She completed her B.S. in Electronic Engineering at Tsinghua University from 2014 to 2018. She has also interned at Google Research Brain Team and Mitsubishi Electric Research Laboratories. Her research interests include reinforcement learning (RL), non-convex optimization, high-dimensional statistical estimation, and robotics, ranging from theory to applications. Her current research focuses on 1) theoretical works: designing provable sample-efficient algorithms for value-based RL, offline RL, and robust RL problems, resorting to optimization and statistics; 2) practical works: reinforcement learning algorithms (DRL) on different large-scale problems such as robotics, Atari games, web navigation and etc.

Organizers:

Shangding Gu (UC Berkeley)

Josip Josifovski (TUM)

Yali Du (KCL)

Alap Kshirsagar (TU Darmstadt)

Yuhao Ding (UC Berkeley) 

Ming Jin (Virginia Tech)

Advisors:

Alois Knoll (TUM)

Jan Peters (TU Darmstadt)

Mannor Shie (Israel Institute of Technology & Nvidia Research)

Jun Wang (UCL)

Costas Spanos (UC Berkeley)


If we receive the speaker's permission, we will release the recording videos on the Safe RL YouTube Channel (we can only make the videos public after receiving  permission).