Reinforcement Learning under Partial Observability
NeurIPS 2018 Workshop
Saturday December 08, 2018
Palais des Congrès de Montréal, Montréal CANADA
Reinforcement learning (RL) has succeeded in many challenging tasks such as Atari, Go, and Chess and even in high dimensional continuous domains such as robotics. Most impressive successes are in tasks where the agent observes the task features fully. However, in real world problems, the agent usually can only rely on partial observations. In real time games the agent makes only local observations; in robotics the agent has to cope with noisy sensors, occlusions, and unknown dynamics. Even more fundamentally, any agent without a full a priori world model or without full access to the system state, has to make decisions based on partial knowledge about the environment and its dynamics.
Reinforcement learning under partial observability has been tackled in the operations research, control, planning, and machine learning communities. One of the goals of the workshop is to bring researchers from different backgrounds together. Moreover, the workshop aims to highlight future applications. In addition to robotics where partial observability is a well known challenge, many diverse applications such as wireless networking, human-robot interaction and autonomous driving require taking partial observability into account.
Partial observability introduces unique challenges: the agent has to remember the past but also connect the present with potential futures requiring memory, exploration, and value propagation techniques that can handle partial observability. Current model-based methods can handle discrete values and take long term information gathering into account while model-free methods can handle high-dimensional continuous problems but often assume that the state space has been created for the problem at hand such that there is sufficient information for optimal decision making or just add memory to the policy without taking partial observability explicitly into account.
In this workshop, we want to go further and ask among others the following questions.
- For decision-making under partial observability is reinforcement the most suitable/effective approach to learning?
- How can we extend deep RL methods to robustly solve partially observable problems?
- Can we learn concise abstractions of history that are sufficient for high-quality decision-making?
- There have been several successes in decision making under partial observability despite the inherent challenges. Can we characterize problems where computing good policies is feasible?
- Since decision making is hard under partial observability do we want to use more complex models and solve them approximately or use (inaccurate) simple models and solve them exactly? Or not use models at all?
- How can we use control theory together with reinforcement learning to advance decision making under partial observability?
- Can we combine the strengths of model-based and model-free methods under partial observability?
- Can recent method improvements in general RL already tackle some partially observable applications which were not previously possible?
- How do we scale up reinforcement learning in multi-agent systems with partial observability?
- Do hierarchical models / temporal abstraction improve RL efficiency under partial observability?
Joelle Pineau (McGill University, Canada / Facebook)
Pieter Abbeel (UC Berkeley, USA)
David Silver (Google DeepMind / University College London, UK)
Leslie Kaelbling (MIT, USA)
Peter Stone (University of Texas at Austin, USA)
Anca Dragan (UC Berkeley, USA)
Jilles Dibangoye (INSA Lyon, France)
08:30 AM Opening Remarks
08:40 AM Invited Talk: Joelle Pineau (McGill University, Canada / Facebook)
09:05 AM Invited Talk: Leslie Kaelbling (MIT, USA)
09:30 AM Contributed Talk 1: High-Level Strategy Selection under Partial Observability in StarCraft: Brood War Jonas Gehring, Da Ju, Vegard Mella, Daniel Gant, Nicolas Usunier and Gabriel Synnaeve
09:45 AM Invited Talk: David Silver (Google DeepMind / University College London, UK)
10:10 AM Contributed Talk 2: Joint Belief Tracking and Reward Optimization through Approximate Inference Pavel Shvechikov, Alexander Grishin, Arseny Kuznetsov, Alexander Fritzler and Dmitry Vetrov
10:25 AM Coffee Break
11:00 AM Contributed Talk 3: Learning Dexterous In-Hand Manipulation Marcin Andrychowicz, Bowen Baker, Maciej Chociej, Rafal Jozefowicz, Bob McGrew, Jakub Pachocki, Arthur Petron, Matthias Plappert, Glenn Powell, Alex Ray, Jonas Schneider, Szymon Sidor, Josh Tobin, Peter Welinder, Lilian Weng and Wojciech Zaremba
11:15 AM Invited Talk: Pieter Abbeel (UC Berkeley, USA)
11:40 AM Spotlights & Poster Session
12:00 PM Lunch Break
02:00 PM Invited Talk: Peter Stone (University of Texas at Austin, USA)
02:25 PM Contributed Talk 4: Differentiable Algorithm Networks: Learning Wrong Models for Wrong Algorithms Peter Karkus, David Hsu, Leslie Pack Kaelbling and Tomas Lozano-Perez
02:40 PM Invited Talk: Jilles Dibangoye (INSA Lyon, France)
03:05 PM Coffee Break
03:35 PM Invited Talk: Anca Dragan (UC Berkeley, USA)
04:00 PM Panel Discussion
05:00 PM Poster Session
Joni Pajarinen (TU Darmstadt, Germany)
Christopher Amato (Northeastern University, USA)
Pascal Poupart (University of Waterloo, Canada)
David Hsu (National University of Singapore, Singapore)