Programme

Schedule

The workshop will be a 1-day event on July 23, 2022. The workshop will include 7 invited talks and the remainder will be filled in by contributed talks. In terms of the invited talks, the main reasoning behind our schedule is to have invited speakers from the various approaches to safe RL to capture the full interdisciplinary nature of the ongoing research.

The full schedule is given below:

Safe RL 2022 Schedule

Invited talks


Yash Chandak

University of Massachusetts Amherst

Safe RL: Going Beyond Expected Cost/Return Metrics and Stationarity Assumptions

Abstract: Literature in safe RL has predominantly focused on safety in terms of the expected value of a performance measure called the return, and that too under the assumption that the environment is stationary. However, real-world applications involve critical systems with financial risks and human-life risks. Therefore, we (a) need to account for various performance statistics (e.g., (conditional) value at risk (CVaR), quantiles, variance, etc.) to quantify risk better, (b) and do so using off-policy data such that the behavior of a new policy can be predicted before even deploying that policy, while (c) also dealing with non-stationarity in the environment. In this talk, I will discuss our steps towards a universal off-policy estimator (UnO)—one that provides off-policy estimates and high-confidence bounds for any parameter of the return distribution. UnO can estimate and simultaneously bound the mean, variance, quantiles/median, inter-quantile range, CVaR, and the entire cumulative distribution of returns. Finally, I will also discuss UnO’s applicability in various settings, including fully observable, partially observable (i.e., with unobserved confounders), Markovian, non-Markovian, stationary, smoothly non-stationary, and discrete distribution shift.


Bio: Yash is a fifth-year Ph.D. candidate at the University of Massachusetts, U.S., where he is a member of the Autonomous Learning Lab (ALL) and is advised by Prof. Philip Thomas. His primary interest is in reinforcement learning, specifically on the sub-topics related to non-stationarity, safety, off-policy data, and other challenges stemming from real-world applications. His works have been accepted at ICML, NeurIPS, and AAAI, among others. He has been awarded the Google Ph.D. fellowship, has received outstanding student paper honorable mention at AAAI, and also had the opportunity to intern at Adobe Research and DeepMind.

Chih-Hong Cheng

Fraunhofer IKS

What should be included in a safety standard for RL?

Abstract: The purpose of this informal talk is to interactively discuss with participants on what should be included in the currently developing ISO safety standard for AI in automotive, reflecting the RL view. I will highlight some of my concerns and my proposed recommendations, hoping to collect the feedback from the audience.


Bio: Chih-Hong Cheng is currently a researcher at Fraunhofer IKS. His research interests include software engineering, formal methods, and AI/ML for trustworthy autonomy. He received his doctoral degree in CS from the Technical University of Munich. Before Fraunhofer IKS, he held research positions in DENSO, fortiss, and ABB.

Felix Berkenkamp

Bosch Center for AI

Missing Pieces towards Safe Model-based Reinforcement Learning

Abstract: Model-based reinforcement learning holds the promise of enabling data-efficient reinforcement learning in the real-world. However, on real-world systems we face additional challenges such as safety constraints, scarce data, and inaccurate models. In this talk, we discuss how these challenges interact, identify missing pieces towards safe model-based RL, and take first steps towards addressing them.



Bio: Felix Berkenkamp is a research scientist at the Bosch Center for Artificial Intelligence, where he leads the reinforcement learning activity. His main interest lies in the fundamental problems behind data-efficient and safe reinforcement learning required to enable learning on real-world systems. Previously, he was the workflow co-chair for ICML 2018 and received the ELLIS PhD award (2020) for his PhD thesis at ETH Zurich, where he worked together with Andreas Krause and Angella Schoellig. During this time, he held an AI fellowship from the Open Philanthropy Project, was an associated fellow at the Max Planck ETH Center for Learning Systems and a postgraduate affiliate at the Vector institute.




Sanjit Seshia
University of California, Berkeley
Environment Modelling for Verified Reinforcement Learning

Abstract: Reinforcement learning (RL) has been shown to be an effective method for decision making

and problem solving in uncertain and unknown environments. However, more needs to be done

to achieve the goal of Verified RL -- i.e., achieving strong, ideally provable, assurances of

correctness and trustworthiness of RL systems. Formal verification of any system is with respect

to an operating environment --- however, how do we perform formal verification when the

environment is uncertain and unknown? In this talk, I will discuss the challenge of environment

modelling for learning systems with a particular focus on RL. I will describe some progress towards

addressing the environment modelling challenge, and offer my viewpoint on the important problems

that remain to be solved.

Bio: Sanjit A. Seshia is the Cadence Founders Chair Professor in the Department of Electrical Engineering and Computer Sciences at the University of California, Berkeley. He received an M.S. and Ph.D. in Computer Science from Carnegie Mellon University, and a B.Tech. in Computer Science and Engineering from the Indian Institute of Technology, Bombay. His research interests are in formal methods for dependable and secure computing, with a current focus on the areas of cyber-physical systems, computer security, machine learning, and robotics. He has made pioneering contributions to the areas of satisfiability modulo theories (SMT), SMT-based verification, and inductive program synthesis. He is co-author of a widely-used textbook on embedded, cyber-physical systems and has led the development of technologies for cyber-physical systems education based on formal methods. His awards and honors include a Presidential Early Career Award for Scientists and Engineers (PECASE), an Alfred P. Sloan Research Fellowship, the Frederick Emmons Terman Award for contributions to electrical engineering and computer science education, the Donald O. Pederson Best Paper Award for the IEEE Transactions on CAD, the IEEE Technical Committee on Cyber-Physical Systems (TCCPS) Mid-Career Award, and the Computer-Aided Verification (CAV) Award for pioneering contributions to the foundations of SMT solving. He is a Fellow of the ACM and the IEEE.

Ruzica Piskac

Yale University

Accountable AI-based Software in Complex Sociotechnical Context

Abstract: Modern software and cyberphysical systems face open-ended tasks in complex environments, rendering accountability in the event of harm or injury an ever-growing challenge for both social and technical processes. Although well-understood techniques can judge whether programs obey formal properties, the real-world assurance that this process provides depends on its scope and precision. Harms can even occur when every agent operates correctly according to its model of the system and knowledge of its state. An understanding of the contribution of autonomous agents to a harm is necessary in order to consider counterfactuals and verify whether those agents acted appropriately. Philosophy and law employ decision artifacts such as beliefs, desires, and intentions as the basis for such assessments, motivating an understanding of how they arise within modern software systems. In this talk we will describe how to use formal reasoning to assure the machine analogues of these decision artifacts will be faithfully recorded for accountability processes.


Bio: Ruzica Piskac is an associate professor of computer science at Yale University. Her research interests span the areas of programming languages, software verification, automated reasoning, and code synthesis. A common thread in Ruzica's research is improving software reliability and trustworthiness using formal techniques. Ruzica joined Yale in 2013 as an assistant professor and prior to that, she was an independent research group leader at the Max Planck Institute for Software Systems in Germany. In July 2019, she was named the Donna L. Dubinsky Associate Professor of Computer Science, one of the highest recognition that an untenured faculty member at Yale can receive. Ruzica has received various recognitions for research and teaching, including the Patrick Denantes Prize for her PhD thesis, a CACM Research Highlight paper, an NSF CAREER award, the Facebook Communications and Networking award, the Microsoft Research Award for the Software Engineering Innovation Foundation (SEIF), the Amazon Research Award, and the 2019 Ackerman Award for Teaching and Mentoring.



Nils Jansen

Radboud University Nijmegen

Safe RL: A Collection of Flavors

Abstract: Although Reinforcement Learning has shown promising results, there are still numerous challenges preventing its adoption in practice. In many real-world applications, safety is a paramount requirement, which is incompatible with the trial and error nature of RL.

In particular, RL agents operate in uncertain environments, which require them to explore in order to reduce the potentially very high uncertainty by gathering information. The key problem is that typical exploration strategies choose actions at random with potentially harmful consequences for the agent or its environment.

This problem has triggered research and enormous progress in the area of safe RL. Common approaches aim to render, for instance, the exploration safe with respect to particular, previously specified safety constraints. Various research communities are active in this area, such as artificial intelligence and machine learning in general, but also formal verification and control theory. Yet, these communities often work disconnected from each other and may not even be aware of each other's results.

This tutorial provides an overview of different perspectives safe RL, in particular constrained RL and shielded RL. We highlight critical issues common to any approach to safe RL and pose the following questions:

  • What is the specific understanding of safety?

  • Which type of prior knowledge is needed to reason about the uncertainty of an agent's environment?

  • How safe is safe RL ultimately?

We critically examine and compare the answers to these questions for various approaches, and present in-depth examples of how to exploit prior knowledge to empower RL agents with safety guarantees.


Bio: Nils Jansen is a tenured assistant professor at the Institute for Computing and Information Science (iCIS) at the Radboud University, Nijmegen, The Netherlands. He received his Ph.D. with distinction from RWTH Aachen University, Germany in 2015. Prior to Radboud University, he was a postdoc and research associate at the University of Texas at Austin. His current research is on formal reasoning about safety and dependability aspects in artificial intelligence (AI). At the heart of his research is the development of concepts from formal methods and control theory to reason about uncertainty and partial information in AI systems. He holds several grants within this area, both in academic and industrial settings. He is a member of the European Lab for Learning and Intelligent Systems (ELLIS).