Safe Reinforcement Learning Online Seminar

 Get involved: We welcome the researchers and students who are interested in safe RL to join us! To receive relevant seminar information in time, please click the following link to register.

Purpose

Reinforcement learning (RL) algorithms that satisfy safety constraints are crucial for real-world applications. The development of safe RL algorithms has received substantial attention in recent years. However, several challenges remain unsolved. For example, how to ensure safety while deploying RL methods in real-world applications. We are organizing this Safe RL Seminar to discuss the recent advances and challenges in safe RL with researchers from academia and industry.

Current Seminar

Talk Title:  Contextual Bandits with Constraints Revisited: A Modular Approach with Improved Rates

Talk Time: 1st of August at 16:00h CEST time (07:00h California time, 10:00h Eastern Time, 22:00h Beijing time)

Host: Shangding Gu

Abstract: In this talk, I will share some recent progress on contextual bandits with unknown constraints. In this setup, the learner aims to maximize the reward while guaranteeing multiple unknown constraints. The goal is to achieve a sublinear regret and sublinear cumulative constraint violations. Previous work by [SSF’23] proposed a no-regret dynamics-based approach. Under the strong assumption that an optimal solution is feasible by a constant margin, this approach can lead to an optimal regret and sublinear constraint violation. We improve this by showing that one can still achieve optimal regret and even zero cumulative constraint violation under a much weaker Slater’s condition, i.e., there exists a solution (rather than an optimal solution) that is feasible by a constant margin. This is achieved by a new analysis of their proposed Lagrangian approach, relying on insights on saddle point optimization and tools from constrained convex optimization. We can further show that some previous algorithms (e.g., UCB-type) can be also viewed as an instance of our general framework. Finally, I will discuss some concurrent work on constrained contextual bandits. This is based on the joint work with Alex, Karthik and Dylan.

Bio: Xingyu Zhou is currently an Assistant Professor at ECE of Wayne State University. He received his Ph.D. from Ohio State University (advised by Ness Shroff), his master’s and bachelor’s degrees from Tsinghua University, and BUPT (all with the highest honors). His research interests include machine learning (e.g., bandits, reinforcement learning), stochastic systems, and applied probability (e.g., load balancing). Currently, he is particularly interested in online decision-making with formal privacy and/or robustness guarantees. His research has not only led to several invited talks at Caltech, CMU, and UCLA but also won the Best Student Paper Award and Runner-up awards. He is also the recipient of various other awards, including the NSF CRII award, the Presidential Fellowship at OSU, the Outstanding Graduate Award of Beijing City, the National Scholarship of China, the Academic Rising Star Award at Tsinghua University, and the Dec. 9th Scholarship of Tsinghua University. He has been a TPC for conferences and workshops, including Sigmetrics, MobiHoc, INFOCOM, TPDP, etc, and an Area Chair of NeurIPS.  

Organizers:

Shangding Gu (UC Berkeley)

Josip Josifovski (TUM)

Yali Du (KCL)

Alap Kshirsagar (TU Darmstadt)

Yuhao Ding (UC Berkeley) 

Ming Jin (Virginia Tech)

Advisors:

Alois Knoll (TUM)

Jan Peters (TU Darmstadt)

Mannor Shie (Israel Institute of Technology & Nvidia Research)

Jun Wang (UCL)

Costas Spanos (UC Berkeley)


If we receive the speaker's permission, we will release the recording videos on the Safe RL YouTube Channel (we can only make the videos public after receiving  permission).