Ensuring the offline reliability and online safety of reinforcement learning agents
Abstract: Reinforcement Learning (RL) agents can solve general problems based on little to no knowledge of the underlying environment. These agents often learn through experience, using a trial-and-error strategy that can lead to practical innovations, but this randomized process might cause undesirable events. Safe RL studies how to make such agents more reliable and how to ensure they behave appropriately. We investigate these issues in online settings, where the agent interacts directly with the environment, and offline settings, where the agent only has access to historical data. We develop new RL methods that exploit prior knowledge about the structure of the problem. In particular, we consider factored problems, where the dynamics of each state variable depend only on a small subset of variables. Exploiting this structure, we propose reliable offline algorithms that can improve the policy using fewer data and online algorithms that comply with safety constraints while learning. Besides safety and reliability, we also touch on other issues preventing the deployment of RL to real-world tasks, such as partial observability, generalization and high dimensional data.
Bio: Thiago is a PostDoc researcher with the Department of Software Science (SWS) at Radboud University Nijmegen advised by Dr. Nils Jansen. Previously, he was a Ph.D. candidate within the Algorithmics Group at Delft University of Technology, advised by Dr. Matthijs Spaan. His research interests lie primarily in the automation of sequential decision making, focusing on reinforcement learning.
He obtained his M.Sc. degree in artificial intelligence from the Instituto de Matemática e Estatística at Universidade de São Paulo under the supervision of Prof. Leliane N. de Barros and a bachelor degree in computer science at the Departamento de Ciência da Computação at Universidade Federal de Lavras.