Deep Reinforcement Learning Workshop
NeurIPS 2020
About
In recent years, the use of deep neural networks as function approximators has enabled researchers to extend reinforcement learning techniques to solve increasingly complex control tasks. The emerging field of deep reinforcement learning has led to remarkable empirical results in rich and varied domains like robotics, strategy games, and multi-agent interactions. This workshop will bring together researchers working at the intersection of deep learning and reinforcement learning, and it will help interested researchers outside of the field gain a high-level view about the current state of the art and potential directions for future contributions.
For previous editions, please visit NeurIPS 2019, 2018, 2017, 2016, 2015.
Submit your questions for panel discussions here: https://forms.gle/TE5du16CTWo6GXrQA
Schedule (December 11th 2020, 8:15am-7:00pm PST)
08:15 - 08:30 Welcome and Introduction
08:30 - 09:00 Pierre-Yves Oudeyer "Machines that invent their own problems: Towards open-ended learning of skills" [video]
09:00 - 10:00 Contributed talks
09:00 - 09:15 Sammy Christen "Learning Functionally Decomposed Hierarchies for Continuous Control Tasks with Path Planning" [video]
09:15 - 09:30 Saikrishna Gottipati "Maximum Reward Formulation In Reinforcement Learning" [video]
09:30 - 09:45 Karl Pertsch "Accelerating Reinforcement Learning with Learned Skill Priors" [video]
09:45 - 10:00 Hyeonwoo Noh "Asymmetric self-play for automatic goal discovery in robotic manipulation" [video]
10:00 - 10:30 Marc Bellemare "Autonomous navigation of stratospheric balloons using reinforcement learning" [live]
10:30 - 11:00 Break
11:00 - 11:30 Peter Stone "Grounded Simulation Learning for Sim2Real with Connections to Off-Policy Reinforcement Learning" [video]
11:30 - 12:00 Contributed talks
12:00 - 12:30 Matt Botvinick "Alchemy: A Benchmark Task Distribution for Meta-Reinforcement Learning Research" [video]
12:30 - 13:30 Poster session 1 [gather.town] [poster locations]
13:30 - 14:00 Susan Murphy "We used RL but…. Did it work?!" [video]
14:00 - 14:30 Contributed talks
14:30 - 15:00 Anusha Nagabandi "Model-based Deep Reinforcement Learning for Robotic Systems" [video]
15:00 - 15:30 Break
15:30 - 16:00 Ashley Edwards "Learning Offline from Observation" [video]
16:00 - 16:30 NeurIPS RL Competitions Results Presentations
16:00 - 16:07 Sharada Mohanty "Flatland challenge" [video]
16:07 - 16:15 Antoine Marot "Learning to run a power network" [video]
16:15 - 16:22 Sharada Mohanty "Procgen challenge" [live]
16:30 - 17:00 Karen Liu "Deep Reinforcement Learning for Physical Human-Robot Interaction" [video]
17:00 - 18:00 Panel discussion (moderator: Pieter Abbeel)
Marc Bellemare, Matt Botvinick, Ashley Edwards, Karen Liu, Susan Murphy, Anusha Nagabandi, Pierre-Yves Oudeyer, Peter Stone
Submit your questions to the panel here: https://forms.gle/TE5du16CTWo6GXrQA
18:00 - 19:00 Poster session 2 [gather.town] [poster locations]
Important Dates and Deadlines
Paper submission deadline: October 5th 2020
Submission site: https://cmt3.research.microsoft.com/DRLW2020
Call for Papers & formatting instruction: [link]
Author Notification: October 23rd 2020
Workshop date: December 11th 2020
Workshop Time: 8:15am - 7:00pm PST
Invited Speakers
Organizers
UC Berkeley / Covariant
UC Berkeley
Stanford / Google
UC Berkeley
UC Berkeley
McGill / FAIR
University of Michigan
DeepMind
University of Michigan / DeepMind
University of Michigan
Accepted Papers
[pdf] [video] Amortized Variational Deep Q Network
Haotian Zhang (Xi'an Jiaotong University); Yuhao Wang (Xi'an Jiaotong University); Jianyong Sun (Xi'an Jiaotong University)*; Zongben Xu (Xi'an Jiaotong University)
[pdf] [video] DREAM: Deep Regret minimization with Advantage baselines and Model-free learning
Eric Steinberger (Climate Science), Adam Lerer (Facebook AI Research), Noam Brown (Facebook AI Research)
[pdf] [supplementary material] [video] Learning Functionally Decomposed Hierarchies for Continuous Control Tasks with Path Planning
Sammy Christen (ETH Zurich); Lukas Jendele (ETH Zurich); Emre Aksan (ETH Zurich); Otmar Hilliges (ETH Zurich)
[pdf] [video] Safety Aware Reinforcement Learning
Santiago Miret (Intel Labs); Somdeb Majumdar (Intel Labs); Carroll Wainwright (Partnership on AI)
[pdf] [video] PixL2R: Guiding Reinforcement Learning Using Natural Language by Mapping Pixels to Rewards
Prasoon Goyal (UT Austin); Scott Niekum (UT Austin); Raymond Mooney (UT Austin)
[pdf] [video] Asymmetric self-play for automatic goal discovery in robotic manipulation
OpenAI (OpenAI); Matthias Plappert (OpenAI); Raul Sampedro (OpenAI); Tao Xu (OpenAI); Ilge Akkaya (OpenAI); Vineet Kosaraju (OpenAI); Peter Welinder (OpenAI); Ruben D'Sa (OpenAI); Arthur Petron (OpenAI); Henrique Ponde (OpenAI); Alex Paino (OpenAI); Hyeonwoo Noh (OpenAI); Lilian Weng (OpenAI)*; Qiming Yuan (OpenAI); Casey Chu (OpenAI); Wojciech Zaremba (OpenAI)
[pdf] [video] Sample Efficient Training in Multi-Agent Adversarial Games with Limited Teammate Communication
Hardik Meisheri (TCS Research); Harshad Khadilkar (TCS Research)
[pdf] [supplementary material] [video] Multi-task Reinforcement Learning with a Planning Quasi-Metric
Vincent Micheli (EPFL); Karthigan Sinnathamby (EPFL); François Fleuret (University of Geneva)
[pdf] [video] Disentangled Planning and Control in Vision Based Robotics via Reward Machines
Alberto Camacho (Google); Jacob Varley (Google); Andy Zeng (Google); Deepali Jain (Google); Atil Iscen (Google); Dmitry Kalashnikov (Google)
[pdf] [supplementary material] [video] Maximum Mutation Reinforcement Learning for Scalable Control
Karush Suri (University of Toronto); Xiao Qi Shi (RBC Capital Markets); Konstantinos Plataniotis (University of Toronto); Yuri Lawryshyn (University of Toronto)
[pdf] [supplementary material] [video] Energy-based Surprise Minimization for Multi-Agent Value Factorization
Karush Suri (University of Toronto); Xiao Qi Shi (RBC Capital Markets); Konstantinos Plataniotis (University of Toronto); Yuri Lawryshyn (University of Toronto)
[pdf] [supplementary material] [video] Correcting Momentum in Temporal Difference Learning
Emmanuel Bengio (McGill University); Joelle Pineau (McGill / Facebook); Doina Precup (McGill University)
[pdf] [supplementary material] [video] A Policy Gradient Method for Task-Agnostic Exploration
Mirco Mutti (Politecnico di Milano, Università di Bologna); Lorenzo Pratissoli (Politecnico di Milano); Marcello Restelli (Politecnico di Milano)
[pdf] [video] Dream and Search to Control: Latent Space Planning for Continuous Control
Anurag Koul (Oregon State University); Varun Kumar (Intel AI Lab); Alan Fern (Oregon State University); Somdeb Majumdar (Intel Labs)
[pdf] [video] Unsupervised Task Clustering for Multi-Task Reinforcement Learning
Johannes Ackermann (Technical University of Munich); Oliver Richter (ETH Zurich); Roger Wattenhofer (ETH Zurich)
[pdf] [video] Learning Intrinsic Symbolic Rewards in Reinforcement Learning
Hassam Sheikh (University of Central Florida); Shauharda Khadka (Oregon State University); Santiago Miret (Intel Labs); Somdeb Majumdar (Intel Labs)
[pdf] [video] Preventing Value Function Collapse in Ensemble Q-Learning by Maximizing Representation Diversity
Hassam Sheikh (University of Central Florida); Ladislau Boloni (University of Central Florida)
[pdf] [video] Quantifying Differences in Reward Functions
Adam Gleave (UC Berkeley); Michael Dennis (UC Berkeley); Shane Legg (); Stuart Russell (UC Berkeley); Jan Leike (DeepMind)
[pdf] [video] Online Hyper-parameter Tuning in Off-policy Learning via Evolutionary Strategies
Yunhao Tang (Columbia University); Krzysztof Choromanski (Google Brain Robotics)
[pdf] [video] DERAIL: Diagnostic Environments for Reward And Imitation Learning
Pedro Freire (Ecole Polytechnique); Adam Gleave (UC Berkeley); Sam Toyer (UC Berkeley); Stuart Russell (UC Berkeley)
[pdf] [video] Exploring Zero-Shot Emergent Communication in Embodied Multi-Agent Populations
Kalesha Bullard (Facebook AI Research); Franziska Meier (Facebook AI Research); Douwe Kiela (Facebook AI Research); Joelle Pineau (Facebook); Jakob Foerster (Facebook)
[pdf] [video] On Effective Parallelization of Monte Carlo Tree Search
Anji Liu (UCLA); Yitao Liang (UCLA); Ji Liu (Kwai Inc.); Guy Van den Broeck (UCLA); Jianshu Chen (Tencent AI Lab)
[pdf] [video] Unlocking the Potential of Deep Counterfactual Value Networks
Ryan Zarick (Minimal AI); Bryan Pellegrino (Minimal AI); Noam Brown (Facebook AI Research); Caleb Banister (Minimal AI)
[pdf] [video] FactoredRL: Leveraging Factored Graphs for Deep Reinforcement Learning
Bharathan Balaji (Amazon); Petros Christodoulou (Amazon); Xiaoyu Lu (Amazon); Byungsoo Jeon (Amazon); Jordan Bell-Masterson (Amazon)
[pdf] [video] Reusability and Transferability of Macro Actions for Reinforcement Learning
Yi Hsiang Chang (National Tsing Hua University); Kuan-Yu Chang (National Tsing Hua University); Henry Kuo (Harvard University); Chun-Yi Lee (National Tsing Hua University)
[pdf] [video] Decoupling Exploration and Exploitation in Meta-Reinforcement Learning without Sacrifices
Evan Liu (Stanford University); Aditi Raghunathan (Stanford University); Percy Liang (Stanford University); Chelsea Finn (Stanford)
[pdf] Mastering Atari with Discrete World Models
Danijar Hafner (Google); Timothy Lillicrap (DeepMind); Mohammad Norouzi (Google Research, Brain Team); Jimmy Ba (University of Toronto)
[pdf] [video] Action and Perception as Divergence Minimization
Danijar Hafner (Google); Pedro Ortega (DeepMind); Jimmy Ba (University of Toronto); Thomas Parr (University College London); Karl Friston (University College London); Nicolas Heess (DeepMind)
[pdf] [video] Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning
Rishabh Agarwal (Google Research, Brain Team); Marlos C. Machado (Google Brain); Pablo Samuel Castro (Google); Marc G. Bellemare (Google Brain)
[pdf] [video] Skill Transfer via Partially Amortized Hierarchical Planning
Kevin Xie (University of Toronto); Homanga Bharadhwaj (University of Toronto, Vector Institute); Danijar Hafner (Google); Animesh Garg (University of Toronto, Vector Institute, Nvidia); Florian Shkurti (University of Toronto)
[pdf] [video] Average Reward Reinforcement Learning with Monotonic Policy Improvement
Yiming Zhang (New York University); Keith Ross (New York University Shanghai)
[pdf] [video] Randomized Ensembled Double Q-Learning: Learning Fast Without a Model
Xinyue Chen (NYU Shanghai); Che Wang (New York University); Zijian Zhou (NYU Shanghai); Keith Ross (New York University Shanghai)
[pdf] [video] Combating False Negatives in Adversarial Imitation Learning
Konrad Żołna (Jagiellonian University); Chitwan Saharia (Indian Institute of Technology, Bombay); Léonard Boussioux (MIT, CentraleSupélec); David Yu-Tung Hui (Mila); Maxime Chevalier-Boisvert (Mila, Université de Montréal); Dzmitry Bahdanau (Element AI); Yoshua Bengio (Mila)
[pdf] [video] Evaluating Agents Without Rewards
Brendon Matusch; Jimmy Ba (University of Toronto); Danijar Hafner (Google)
[pdf] [video] World Model as a Graph: Learning Latent Landmarks for Planning
Lunjun Zhang (University of Toronto); Ge Yang (University of Chicago); Bradly Stadie (Vector Institute)
[pdf] [video] Interactive Visualization for Debugging RL
Shuby Deshpande (Carnegie Mellon University); Ben Eysenbach (Carnegie Mellon University); Jeff Schneider
[pdf] [video] Conservative Safety Critics for Exploration
Homanga Bharadhwaj (University of Toronto, Vector Institute); Aviral Kumar (UC Berkeley); Nicholas Rhinehart (UC Berkeley); Sergey Levine (UC Berkeley); Florian Shkurti (University of Toronto); Animesh Garg (University of Toronto, Vector Institute, Nvidia)
[pdf] [video] D2RL: Deep Dense Architectures in Reinforcement Learning
Samarth Sinha (University of Toronto, Vector Institute); Homanga Bharadhwaj (University of Toronto, Vector Institute); Aravind Srinivas (UC Berkeley); Animesh Garg (University of Toronto, Vector Institute, Nvidia)
[pdf] [video] Weighted Bellman Backups for Improved Signal-to-Noise in Q-Updates
Kimin Lee (UC Berkeley); Michael Laskin (UC Berkeley); Aravind Srinivas (UC Berkeley); Pieter Abbeel (UC Berkeley)
[pdf] [video] Benchmarking Multi-Agent Deep Reinforcement Learning Algorithms
Chao Yu (Tsinghua University); Akash Velu (UC Berkeley); Eugene Vinitsky (UC Berkeley); Yu Wang (tsinghua university); Alexandre Bayen (University of California, Berkeley); Yi Wu (OpenAI)
[pdf] [supplementary material] [video] A Deep Value-based Policy Search Approach for Real-world Vehicle Repositioning on Mobility-on-Demand Platforms
Yan Jiao (Didi Research America); Xiaocheng Tang (DiDi AI Labs); ZHIWEI QIN (Didi Research America); Shuaiji Li (DiDi AI Labs); Fan Zhang (DiDi AI Labs); Hongtu Zhu (AI Labs, Didi Chuxing); Jieping Ye (Didi Chuxing)
[pdf] [video] Solving Compositional Reinforcement Learning Problems via Task Reduction
Yunfei Li (Tsinghua University); Huazhe Xu (UC Berkeley); Yilin Wu (Shanghai Qi Zhi Institute); Xiaolong Wang (UCSD); Yi Wu (OpenAI)
[pdf] [video] Discovering Diverse Multi-Agent Strategic Behavior via Reward Randomization
Zhenggang Tang (Peking University); Chao Yu (Tsinghua University); Boyuan Chen (UC Berkeley); Huazhe Xu (UC Berkeley); Xiaolong Wang (UCSD); Fei Fang (Carnegie Mellon University); Simon Du (University of Washington); Yu Wang (tsinghua university); Yi Wu (OpenAI)
[pdf] [video] FinRL: A Deep Reinforcement Learning Library for Automated Stock Trading in Quantitative Finance
Xiao-Yang Liu (Columbia University); Hongyang Yang (Columbia University); Qian Chen (Columbia University); Runjia Zhang (AI4Finance LLC); Liuqing Yang (Columbia University); Bowen Xiao (Imperial College); Christina Dan Wang (New York University)
[pdf] [video] What Matters for On-Policy Deep Actor-Critic Methods? A Large-Scale Study
Marcin Andrychowicz (Google); Anton Raichuk (Google); Piotr Stańczyk (Google Brain); Manu Orsini (Google Brain); Sertan Girgin (Google Brain); Raphael Marinier (Google); Léonard Hussenot (Google Research, Brain Team); Matthieu Geist (Google Brain); Olivier Pietquin (Google Research - Brain Team); Marcin Michalski (Google); Sylvain Gelly (Google Brain); Olivier Bachem (Google Brain)
[pdf] [video] Semantic State Representation for Reinforcement Learning
Erez Schwartz (Technion); Guy Tennenholtz (Technion); Chen Tessler (Technion); Shie Mannor (Technion)
[pdf] [video] Deep Q-Learning with Low Switching Cost
Shusheng Xu (Tsinghua University); Simon Du (University of Washington); Yi Wu (OpenAI)
[pdf] [supplementary material] [video] Diverse Exploration via InfoMax Options
Yuji Kanagawa (The University of Tokyo); Tomoyuki Kaneko (The University of Tokyo)
[pdf] [video] Hyperparameter Auto-tuning in Self-Supervised Robotic Learning
Jiancong Huang (Guangdong University of Technology)
[pdf] [video] Learning to Represent Action Values as a Hypergraph on the Action Vertices
Arash Tavakoli (Imperial College London); Mehdi Fatemi (Microsoft Research); Petar Kormushev (Imperial College London)
[pdf] [video] Explanation Augmented Feedback in Human-in-the-Loop Reinforcement Learning
Lin Guan (Arizona State University); Mudit Verma (Arizona State University); Sihang Guo (University of Texas at Austin); Ruohan Zhang (University of Texas at Austin); Subbarao Kambhampati (Arizona State University)
[pdf] [video] Goal-Conditioned Reinforcement Learning in the Presence of an Adversary
Carlos Purves (University of Cambridge); Pietro Liò (University of Cambridge); Cătălina Cangea (University of Cambridge)
[pdf] [supplementary material] [video] Regularized Inverse Reinforcement Learning
Wonseok Jeon (MILA, McGill University); Chen-Yang Su (MILA, McGill University); Paul Barde (MILA, McGill University); Thang Doan (Mila / McGill University); Derek Nowrouzezahrai (McGill University); Joelle Pineau (McGill / Facebook)
[pdf] [video] Planning from Pixels using Inverse Dynamics Models
Keiran Paster (University of Toronto); Sheila McIlraith (University of Toronto); Jimmy Ba (University of Toronto)
[pdf] [video] Visual Imitation with Reinforcement Learning using Recurrent Siamese Networks
Glen Berseth (University of California Berkeley); Florian Golemo (Mila, ElementAI); Chris Pal (MILA, Polytechnique Montréal, Element AI)
[pdf] [video] Learning Accurate Long-term Dynamics for Model-based Reinforcement Learning
Nathan Lambert (UC Berkeley); Albert Wilcox (UC Berkeley); Howard Zhang (UC Berkeley); Kristofer Pister (UC Berkeley); Roberto Calandra (Facebook)
[pdf] [video] XLVIN: eXecuted Latent Value Iteration Nets
Andreea Deac (Mila, Université de Montréal); Petar Veličković (DeepMind); Ognjen Milinković (University of Belgrade); Pierre-Luc Bacon (Mila); Jian Tang (U Montreal); Mladen Nikolic (University of Belgrade)
[pdf] Model-Based Meta-Reinforcement Learning for Flight with Suspended Payloads
Suneel Belkhale (UC Berkeley); Rachel Li (University of California, Berkeley); Gregory Kahn (UC Berkeley); Rowan McAllister (UC Berkeley); Roberto Calandra (UC Berkeley)
[pdf] [supplementary material] [video] Targeted Query-based Action-Space Adversarial Policies on Deep Reinforcement Learning Agents
Xian Yeow Lee (Iowa State University); Yasaman Esfandiari (Iowa State University); Kai Liang Tan (Iowa State University); Soumik Sarkar (Iowa State University)
[pdf] [video] Parrot: Data-driven Behavioral Priors for Reinforcement Learning
Avi Singh (UC Berkeley); Huihan Liu (UC Berkeley ); Gaoyue Zhou (University of California, Berkeley); Albert Yu (UC Berkeley); Nick Rhinehart (); Sergey Levine (UC Berkeley)
[pdf] [video] Addressing Distribution Shift in Online Reinforcement Learning with Offline Datasets
Seunghyun Lee (KAIST); Younggyo Seo (KAIST); Kimin Lee (UC Berkeley); Pieter Abbeel (UC Berkeley); Jinwoo Shin (KAIST)
[pdf] [video] C-Learning: Horizon-Aware Cumulative Accessibility Estimation
Panteha Naderian (Layer 6 AI); Gabriel Loaiza-Ganem (Layer 6 AI); Harry Braviner (Layer 6 AI); Anthony Caterini (Layer 6 AI); Jesse Cresswell (Layer 6 AI); Tong Li (Layer 6 AI); Animesh Garg (University of Toronto, Vector Institute, Nvidia)
[pdf] [video] Abstract Value Iteration for Hierarchical Deep Reinforcement Learning
Kishor Jothimurugan (University of Pennsylvania); Osbert Bastani (University of Pennysylvania); Rajeev Alur (University of Pennsylvania )
[pdf] [video] Implicit Under-Parameterization Inhibits Data-Efficient Deep Reinforcement Learning
Aviral Kumar (UC Berkeley); Rishabh Agarwal (Google Research, Brain Team); Dibya Ghosh (UC Berkeley); Sergey Levine (UC Berkeley)
[pdf] [video] Beyond Exponentially Discounted Sum: Automatic Learning of Return Function
Yufei Wang (Carnegie Mellon University); Qiwei Ye (Microsoft); Tie-Yan Liu (Microsoft)
[pdf] [video] TACTO: A Simulator for Learning Control from Touch Sensing
Shaoxiong Wang (MIT); Mike Lambeta (Facebook); Po-Wei Chou (Facebook); Roberto Calandra (Facebook)
[pdf] [video] XT2: Training an X-to-Text Typing Interface with Online Learning from Implicit Feedback
Jensen Gao (UC Berkeley); Siddharth Reddy (UC Berkeley); Glen Berseth (University of California Berkeley); Anca Dragan (EECS Department, University of California, Berkeley); Sergey Levine (UC Berkeley)
[pdf] [video] Safe Reinforcement Learning with Natural Language Constraints
Tsung-Yen Yang (Princeton University); Michael Hu (Princeton University); Yinlam Chow (Google AI); Peter Ramadge (Princeton); Karthik Narasimhan (Princeton University)
[pdf] [video] Compute- and Memory-Efficient Reinforcement Learning with Latent Experience Replay
Lili Chen (UC Berkeley); Kimin Lee (UC Berkeley); Aravind Srinivas (); Pieter Abbeel (UC Berkeley)
[pdf] [video] Shortest-Path Constrained Reinforcement Learning for Sparse Reward Tasks
Sungryull Sohn (University of Michigan); Sungtae Lee (Yonsei University); Jongwook Choi (University of Michigan); Honglak Lee (University of Michingan / Google Research); Harm van Seijen (Microsoft); Mehdi Fatemi (Microsoft Research)
[pdf ] [video] Greedy Multi-Step Off-Policy Reinforcement Learning
Yuhui Wang (Nanjing University of Aeronautics and Astronautics, China); Xiaoyang Tan (Nanjing University of Aeronautics and Astronautics, China)
[pdf] [video] OPAL: Offline Primitive Discovery for Accelerating Offline Reinforcement Learning
Anurag Ajay (MIT); Aviral Kumar (UC Berkeley); Pulkit Agrawal (UC Berkeley); Sergey Levine (Google); Ofir Nachum (Google)
[pdf] [video] Emergent Road Rules In Multi-Agent Driving Environments
Avik Pal (Indian Institute of Technology Kanpur); Jonah Philion (University of Toronto, NVIDIA); Yuan-Hong Liao (University of Toronto); Sanja Fidler (University of Toronto, NVIDIA)
[pdf] [supplementary material] [video] Modularity in Reinforcement Learning: An Algorithmic Causality Perspective on Credit Assignment
Michael Chang (UC Berkeley)*; Sid Kaushik (UC Berkeley)*; Sergey Levine (UC Berkeley); Tom Griffiths (Princeton)
[pdf] [video] An Examination of Preference-based Reinforcement Learning for Treatment Recommendation
Nan Xu (Univeristy of Southern California); Nitin Kamra (University of Southern California); Yan Liu (USC)
[pdf] [supplementary material] [video] Learning to Weight Imperfect Demonstrations
Yunke Wang (Wuhan University); Chang Xu (University of Sydney); Bo Du (Wuhan University); Honglak Lee (University of Michingan / Google Research)
[pdf] [supplementary material] [video] Structure and randomness in planning and reinforcement learning
Piotr Kozakowski (University of Warsaw); Piotr Januszewski (University of Warsaw & Gdansk University of Technology); Konrad Czechowski (University of Warsaw); Łukasz Kuciński (Polish Academy of Sciences); Piotr Miłoś (Polish Academy of Sciences)
[pdf] [video] Variational Empowerment as Representation Learning for Goal-Based Reinforcement Learning
Jongwook Choi (Google); Archit Sharma (Google); Sergey Levine (Google); Honglak Lee (Google / U. Michigan); Shixiang Gu (Google Brain)
[pdf] [video] Robust Domain Randomised Reinforcement Learning through Peer-to-Peer Distillation
Chenyang Zhao (University of Edinburgh); Timothy Hospedales (Edinburgh University)
[pdf] [video] Data-Efficient Reinforcement Learning with Self-Predictive Representations
Max Schwarzer (Mila, Université de Montréal); Ankesh Anand (MILA); Rishab Goel (Mila); R Devon Hjelm (Microsoft Research); Aaron Courville (Universite de Montreal); Philip Bachman (Microsoft Research)
[pdf] [video] Accelerating Reinforcement Learning with Learned Skill Priors
Karl Pertsch (University of Southern California); Youngwoon Lee (University of Southern California); Joseph Lim (USC)
[pdf] [video] Model-based Navigation in Environments with Novel Layouts Using Abstract n-D Maps
Linfeng Zhao (Northeastern University); Lawson Wong (Northeastern University)
[pdf] [video] Parameter-based Value Functions
Francesco Faccio (The Swiss AI Lab IDSIA); Louis Kirsch (Swiss AI Lab IDSIA); Jürgen Schmidhuber (IDSIA - Lugano)
[pdf] [video] Online Safety Assurance for Deep Reinforcement Learning
Noga Rotman (Hebrew University of Jerusalem); Michael Schapira (Hebrew University); Aviv Tamar (UC Berkeley)
[pdf] [video] Lyapunov Barrier Policy Optimization
Harshit Sikchi (Carnegie Mellon University); Wenxuan Zhou (Carnegie Mellon University); David Held (CMU)
[pdf] [video] C-Learning: Learning to Achieve Goals via Recursive Classification
Ben Eysenbach (Carnegie Mellon University); Ruslan Salakhutdinov (Carnegie Mellon University); Sergey Levine (UC Berkeley)
[pdf] [video] Off-Dynamics Reinforcement Learning: Training for Transfer with Domain Classifiers
Ben Eysenbach (Carnegie Mellon University); Shreyas Chaudhari (Carnegie Mellon University); Swapnil Asawa (University of Pittsburgh); Ruslan Salakhutdinov (Carnegie Mellon University); Sergey Levine (UC Berkeley)
[pdf] [supplementary material] [video] Influence-aware Memory for Deep Reinforcement Learning in POMDPs
Miguel Suau (Delft University of Technology); Elena Congeduti (Delft University of Technology); Jinke He (Delft University of Technology); Rolf Starre (Delft University of Technology); Aleksander Czechowski (TU Delft); Frans Oliehoek (TU Delft)
[pdf] [video] Multi-Robot Deep Reinforcement Learning via Hierarchically Integrated Models
Katie Kang (UC Berkeley); Gregory Kahn (UC Berkeley); Sergey Levine (University of California, Berkeley)
[pdf] [video] ReaPER: Improving Sample Efficiency in Model-Based Latent Imagination
Martin Bertran (Duke University); Guillermo Sapiro (Duke University); Mariano Phielipp (Intel AI Lab)
[pdf] [video] Maximum Reward Formulation In Reinforcement Learning
Sai Krishna Gottipati (99andBeyond); Yashaswi Pathak (International Institute of Information Technology,Hyderabad); Rohan Nuttall (University of Alberta); . Sahir (University of Alberta); Raviteja Chunduru (McGill University); Ahmed Touati (MILA); Sriram Ganapathi Subramanian (University of Waterloo ); Matthew Taylor (U. of Alberta); Sarath Chandar (Mila)
[pdf] [video] How to make Deep RL work in Practice
Nirnai Rao (Technical University of Munich); Elie Aljalbout (Technical University of Munich); Axel Sauer (University of Tuebingen); Sami Haddadin (Technical University of Munich)
[pdf] [supplementary material] [video] Super-Human Performance in Gran Turismo Sport Using Deep Reinforcement Learning
Florian Fuchs (Sony); Yunlong Song (ETH / University of Zurich); Elia Kaufmann (ETH / University of Zurich); Davide Scaramuzza (University of Zurich & ETH Zurich, Switzerland); Peter Dürr (Sony Europe)
[pdf] [video] Model-Based Reinforcement Learning: A Compressed Survey
Thomas Moerland (Delft University of Technology); Joost Broekens (Leiden University); Catholijn Jonker (Delft University of Technology)
[pdf] [video] Evolving Reinforcement Learning Algorithms
John Co-Reyes (UC Berkeley); Yingjie Miao (Google); Daiyi Peng (Google Brain); Quoc Le (Google Brain); Sergey Levine (Google); Honglak Lee (Google / U. Michigan); Aleksandra Faust (Google Brain)
[pdf] [video] Learning to Reach Goals via Iterated Supervised Learning
Dibya Ghosh (UC Berkeley); Abhishek Gupta (UC Berkeley); Ashwin Reddy (UC Berkeley); Justin Fu (UC Berkeley); Coline Devin (University of California, Berkeley); Ben Eysenbach (Carnegie Mellon University); Sergey Levine (UC Berkeley)
[pdf] [video] Which Mutual-Information Representation Learning Objectives are Sufficient for Control?
Kate Rakelly (UC Berkeley); Abhishek Gupta (UC Berkeley); Carlos Florensa (UC Berkeley); Sergey Levine (UC Berkeley)
[pdf] [video] BeBold: Exploration Beyond the Boundary of Explored Regions
Tianjun Zhang (UC Berkeley); Huazhe Xu (UC Berkeley); Xiaolong Wang (UCSD); Yi Wu (OpenAI); Kurt Keutzer (EECS, UC Berkeley); Joseph Gonzalez (UC Berkeley); Yuandong Tian (Facebook)
[pdf] [supplementary material] [video] Curriculum Learning through Distilled Discriminators
Rahul Siripurapu (USI); Louis Kirsch (Swiss AI Lab IDSIA); Jürgen Schmidhuber (IDSIA - Lugano)
[pdf] [video] Chaining Behaviors from Data with Model-Free Reinforcement Learning
Avi Singh (UC Berkeley); Albert Yu (UC Berkeley); Jonathan Yang (UC Berkeley); Aviral Kumar (UC Berkeley); Jesse Zhang (UC Berkeley); Sergey Levine (UC Berkeley)
[pdf] [supplementary material] [video] Self-Supervised Policy Adaptation during Deployment
Nicklas Hansen (Technical University of Denmark); Rishabh Jangir (University of California San Diego); Yu Sun (); Guillem Alenyà (IRI); Pieter Abbeel (UC Berkeley); Alexei Efros (UC Berkeley); Lerrel Pinto (New York University); Xiaolong Wang (UCSD)
[pdf] [supplementary material] [video] Trust, but verify: model-based exploration in sparse reward environments
Konrad Czechowski (University of Warsaw); TOMASZ ODRZYGÓŹDŹ (Polish Academy of Sciences); Michał Izworski (University of Warsaw); Marek Zbysiński (University of Warsaw); Lukasz Kucinski (IMPAN); Piotr Miłoś (Polish Academy of Sciences)
[pdf] [video] Model-Based Visual Planning with Self-Supervised Functional Distances
Stephen Tian (UC Berkeley); Suraj Nair (Stanford University); Frederik Ebert (UC Berkeley); Sudeep Dasari (Carnegie Mellon University); Ben Eysenbach (Carnegie Mellon University); Chelsea Finn (Stanford); Sergey Levine (UC Berkeley)
[pdf] [video] A Unified View of Inference-based Off-Policy RL: Decoupling Algorithmic and Implementational Sources of Performance Differences
Hiroki Furuta (The University of Tokyo); Tadashi Kozuno (Okinawa Institute of Science and Technology); Tatsuya Matsuhima (The University of Tokyo); Yutaka Matsuo (The University of Tokyo); Shixiang Gu (Google Brain)
[pdf] [supplementary material] [video] Pairwise Weights for Temporal Credit Assignment
Zeyu Zheng (University of Michigan); Risto Vuorio (University of Oxford); Richard Lewis (University of Michigan); Satinder Singh (UMich)
[pdf] [video] Learning to Sample with Local and Global Contexts in Experience Replay Buffer
Youngmin Oh (Samsung Advanced Institute of Technology); Kimin Lee (UC Berkeley); Jinwoo Shin (KAIST); Eunho Yang (KAIST;AITRICS); Sung Ju Hwang (KAIST, AITRICS)
[pdf] [video] Adversarial Environment Generation for Learning to Navigate the Web
Izzeddin Gur (Google); Natasha Jaques (UC Berkeley); Kevin Malta (Google); Manoj Tiwari (Google); Aleksandra Faust (Google Brain); Honglak Lee (Google / U. Michigan); Aleksandra Faust (Google Brain);
[pdf] [video] Optimizing Memory Placement using Evolutionary Graph Reinforcement Learning
Shauharda Khadka (Intel Labs); Estelle Guez Aflalo (Intel Corp); Mattias Marder (Intel Corp); Avrech Ben-David (Technion); Santiago Miret (Intel Labs); Shie Mannor (Technion); Tamir Hazan (Technion); Hanlin Tang (Intel Corporation); Somdeb Majumdar (Intel Labs)*
[pdf] [video] Causal Curiosity: RL Agents Discovering Self-supervised Experiments for Causal Representation Learning
Sumedh Sontakke (University of Southern California); Arash Mehrjou (Mr.); Laurent Itti (University of Southern California); Bernhard Schölkopf (MPI for Intelligent Systems, Tübingen)
[pdf] [video] Optimizing Traffic Bottleneck Throughput using Cooperative, Decentralized Autonomous Vehicles
Eugene Vinitsky (UC Berkeley); Nathan Lichtle (ENS Paris-Saclay); Kanaad Parvate (UC Berkeley); Alexandre Bayen (University of California, Berkeley)
[pdf] [video] Reset-Free Lifelong Learning with Skill-Space Planning
Kevin Lu (UC Berkeley); Aditya Grover (Stanford University); Pieter Abbeel (UC Berkeley); Igor Mordatch (Google)
[pdf] [video] Mirror Descent Policy Optimization
Manan Tomar (Facebook AI Research); Lior Shani (Technion); Yonathan Efroni (Microsoft Research); Mohammad Ghavamzadeh (Google Research)
[pdf] [video] Utilizing Skipped Frames in Action Repeats via Pseudo-Actions
Taisei Hashimoto (The University of Tokyo); Yoshimasa Tsuruoka (The University of Tokyo)
[pdf] [video] Tonic: A Deep Reinforcement Learning Library for Fast Prototyping and Benchmarking
Fabio Pardo (Imperial College London)
[pdf] [video] Revisiting Rainbow: Promoting more insightful and inclusive deep reinforcement learning research
Johan Obando Ceron (UAO); Pablo Samuel Castro (Google)
[pdf] [video] MaxEnt RL and Robust Control
Ben Eysenbach (Carnegie Mellon University); Sergey Levine (UC Berkeley)
[pdf] Bringing order into Actor-Critic Algorithms using Stackelberg Games
Robert Müller (Technical University of Munich)
[pdf] [video] Reinforcement Learning with Latent Flow
Wenling Shang (University of Amsterdam); Xiaofei Wang (University of California, Berkeley); Aravind Rajeswaran (University of Washington); Aravind Srinivas (UC Berkeley)*; Yang Gao (UC Berkeley); Michael Laskin (UC Berkeley)
[pdf] [video] Understanding Learned Reward Functions
Eric Michaud (University of California, Berkeley); Adam Gleave (University of California, Berkeley); Stuart Russell (UC Berkeley)
[pdf] [video] Addressing reward bias in Adversarial Imitation Learning with neutral reward functions
Rohit Jena (Carnegie Mellon University); Siddharth Agrawal (Carnegie Mellon University); Katia Sycara (Carnegie Mellon University)
[pdf] [video] Reinforcement Learning for Sparse-Reward Object-Interaction Tasks in a First-person Simulated 3D Environments
Wilka Carvalho (University of Michigan); Anthony Liang (University of Michigan); Kimin Lee (UC Berkeley); Sungryull Sohn (University of Michigan); Honglak Lee (University of Michingan / Google Research); Richard Lewis (University of Michigan); Satinder Singh (UMich)
[pdf] [supplementary material] [video] Efficient Competitive Self-Play Policy Optimization
Yuanyi Zhong (University of Illinois at Urbana-Champaign); Yuan Zhou (UIUC); Jian Peng (University of Illinois at Urbana-Champaign)
[pdf] [video] Autoregressive Dynamics Models for Offline Policy Evaluation and Optimization
Michael Zhang (University of Toronto ); Thomas Paine (DeepMind); Ofir Nachum (Google); Cosmin Paduraru (DeepMind); George Tucker (Google Brain); Ziyu Wang (Google Research, Brain Team); Mohammad Norouzi (Google Research, Brain Team)
[pdf] [video] Reinforcement Learning with Bayesian Classifiers: Efficient Skill Learning from Outcome Examples
Kevin Li (UC Berkeley); Abhishek Gupta (UC Berkeley); Vitchyr Pong (UC Berkeley); Ashwin Reddy (UC Berkeley); Aurick Zhou (UC Berkeley); Justin Yu (RAIL); Sergey Levine (UC Berkeley)
[pdf] Decoupling Representation Learning from Reinforcement Learning
Adam Stooke (UC Berkeley); Kimin Lee (UC Berkeley); Michael Laskin (UC Berkeley)
[pdf] [video] AWAC: Accelerating Online Reinforcement Learning With Offline Datasets
Ashvin Nair (UC Berkeley); Murtaza Dalal (Carnegie Mellon University); Abhishek Gupta (UC Berkeley); Sergey Levine (UC Berkeley)
[pdf] [video] Inter-Level Cooperation in Hierarchical Reinforcement Learning
Abdul Rahman Kreidieh (UC Berkeley); Glen Berseth (University of California Berkeley); Brandon Trabucco (UC Berkeley); Samyak Parajuli (University of California, Berkeley); Sergey Levine (UC Berkeley); Alexandre Bayen (UC Berkeley)
[pdf] [video] Model-Based Reinforcement Learning via Latent-Space Collocation
Oleh Rybkin (University of Pennsylvania); Chuning Zhu (University of Pennsylvania); Anusha Nagabandi (UC Berkeley); Kostas Daniilidis (University of Pennsylvania); Igor Mordatch (OpenAI); Sergey Levine (University of California, Berkeley)
[pdf] [video] Towards Effective Context for Meta-Reinforcement Learning: an Approach based on Contrastive Learning
Haotian Fu (Tianjin University); Hongyao Tang (Tianjin University); Jianye Hao (Huawei Noah's Ark Lab); Chen Chen (Huawei Noah’s Ark Lab); XIDONG FENG (Department of Automation,Tsinghua University; Huawei Noah ark's Lab); Dong Li ( Huawei Noah's Ark Lab); Wulong Liu (Huawei Noah's Ark Lab)
[pdf] [video] PettingZoo: Gym for Multi-Agent Reinforcement Learning
Justin Terry (University of Maryland, College Park); Benjamin Black (UMD); Mario Jayakumar (University of Maryland, College Park); Ananth Hari (University of Maryland, College Park); Luis Santos (University of Maryland, College Park); Clemens Dieffendahl (Technical University of Berlin); Niall Williams (University of Maryland, College Park); Yashas Lokesh (University of Maryland, College Park); Caroline Horsch ( University of Maryland, College Park); Praveen Ravi (University of Maryland, College Park); Ryan Sullivan (University of Maryland, College Park)
[pdf] [video] DisCo RL: Distribution-Conditioned Reinforcement Learning for General-Purpose Policies
Soroush Nasiriany (UC Berkeley); Vitchyr Pong (UC Berkeley); Ashvin Nair (UC Berkeley); Khazatsky Alexander (UC Berkeley); Glen Berseth (UC Berkeley); Sergey Levine (UC Berkeley)
[pdf] Multi-Agent Option Critic Architecture
Abhinav Gupta (Mila); Jhelum Chakravorty (McGill University); Jikun Kang (McGill University); Xue Liu (McGill University); Doina Precup (McGill University)
[pdf] [video] Measuring Visual Generalization in Continuous Control from Pixels
Jake Grigsby (University of Virginia); Yanjun Qi (University of Virginia)
[pdf] [video] Provably Efficient Policy Optimization via Thompson Sampling
Haque Ishfaq (Mila, McGill University); Zhuoran Yang (Princeton.edu); Andrei Lupu (Mila, McGill University); Viet Nguyen (Mila, McGill University); Lewis Liu (University of Montreal, Mila); Riashat Islam (MILA, Mcgill University); Zhaoran Wang (Northwestern); Doina Precup (McGill University)
[pdf] [video] Outcome-Driven Reinforcement Learning via Variational Inference
Tim G. J. Rudner (University of Oxford); Vitchyr Pong (UC Berkeley); Rowan McAllister (UC Berkeley); Yarin Gal (University of Oxford); Sergey Levine (UC Berkeley)
[pdf] [video] Policy Learning Using Weak Supervision
Jingkang Wang (Uber ATG, University of Toronto); Hongyi Guo (Shanghai Jiao Tong University); Zhaowei Zhu (UC Santa Cruz); Yang Liu (UC Santa Cruz)
[pdf] [supplementary material] [video] Motion Planner Augmented Reinforcement Learning for Robot Manipulation in Obstructed Environments
Jun Yamada (University of Southern California); Youngwoon Lee (University of Southern California); Gautam Salhotra (University of Southern California); Karl Pertsch (University of Southern California); Max Pflueger (University of Southern California); Gaurav Sukhatme (University of Southern California); Joseph Lim (USC); Peter Englert (University of Southern California)
[pdf] [video] Discovery of Options via Meta-Gradients
Vivek Veeriah (University of Michigan); Tom Zahavy (DeepMind); Matteo Hessel (DeepMind); Zhongwen Xu (DeepMind); Junhyuk Oh (DeepMind); Iurii Kemaev (Deepmind); Hado van Hasselt (DeepMind); David Silver (DeepMind); Satinder Singh (DeepMind)
[pdf] [supplementary material] [video] SCC: an efficient deep reinforcement learning agent mastering the game of StarCraft II
Xiangjun Wang (inspir.ai); Junxiao SONG (inspir.ai)
[pdf] [supplementary material] [video] Unsupervised Domain Adaptation for Visual Navigation
Shangda Li (Carnegie Mellon University); Devendra Singh Chaplot (Carnegie Mellon University); Yao-Hung Tsai (Carnegie Mellon University); Yue Wu (Carnegie Mellon University); Louis-Philippe Morency (Carnegie Mellon University); Ruslan Salakhutdinov (Carnegie Mellon University)
[pdf] [video] Continual Model-Based Reinforcement Learning with Hypernetworks
Yizhou Huang (University of Toronto); Kevin Xie (University of Toronto); Homanga Bharadhwaj (University of Toronto, Vector Institute); Florian Shkurti (University of Toronto)
[pdf] [supplementary material] [video] GRAC: Self-Guided and Self-Regularized Actor-Critic
Lin Shao (Stanford University); Yifan You (UCLA); Mengyuan Yan (Stanford University); Qingyun Sun (Stanford university); Jeannette Bohg (Stanford)
[pdf] [video] Harnessing Distribution Ratio Estimators for Learning Agents with Quality and Diversity
Tanmay Gangwani (UIUC); Jian Peng (UIUC); Yuan Zhou (UIUC)
[pdf] [video] R-LAtte: Visual Control via Deep Reinforcement Learning with Attention Network
Mandi Zhao (UC Berkeley); Qiyang Li (University of California, Berkeley); Aravind Srinivas (); Ignasi Clavera (UC Berkeley); Kimin Lee (UC Berkeley); Pieter Abbeel (UC Berkeley)
[pdf] [video] Domain Adversarial Reinforcement Learning
Bonnie Li (McGill); Vincent Francois-Lavet (McGill); Thang Doan (Mila / McGill); Joelle Pineau (McGill / Facebook)
[pdf] [supplementary material] [video] Latent State Models for Meta-Reinforcement Learning from Images
Anusha Nagabandi (UC Berkeley); Zihao Zhao (UC Berkeley); Kate Rakelly (UC Berkeley); Chelsea Finn (Stanford); Sergey Levine (UC Berkeley)
[pdf] [video] Learning Markov State Abstractions for Deep Reinforcement Learning
Cameron Allen (Brown University); Neev Parikh (Brown University); George Konidaris (Brown)
[pdf] [video] Backtesting Optimal Trade Execution Policies in Agent-Based Market Simulator
Siyu Lin (University of Virginia); Peter Beling (University of Virginia)
[pdf] [supplementary material] [video] Deep Bayesian Quadrature Policy Optimization
Ravi Tej Akella (Indian Institute of Technology Roorkee); Kamyar Azizzadenesheli (Purdue University); Mohammad Ghavamzadeh (Google Research); Animashree Anandkumar (Caltech); Yisong Yue (Caltech)
[pdf] [video] Predictive PER: Balancing Priority and Diversity towards Stable Deep Reinforcement Learning
Sanghwa Lee (National Institute of Informatics); Jaeyoung Lee (University of Waterloo); Ichiro Hasuo (National Institute of Informatics & SOKENDAI)
[pdf] [supplementary material] [video] Value Generalization among Policies: Improving Value Function with Policy Representation
Hongyao Tang (Tianjin University); Zhaopeng Meng (School of Computer Software, Tianjin University); Jianye Hao (Tianjin University); Chen Chen (Huawei Noah’s Ark Lab); Daniel Graves (Huawei); Dong Li (Huawei Noah's Ark Lab); Wulong Liu (Huawei Noah's Ark Lab); Yaodong Yang (Huawei Noah's Ark Lab)
[pdf] [supplementary material][video] Successor Landmarks for Efficient Exploration and Long-Horizon Navigation
Christopher Hoang (University of Michigan); Sungryull Sohn (University of Michigan); Jongwook Choi (University of Michigan); Wilka Carvalho (University of Michigan); Honglak Lee (University of Michingan / Google Research)
[pdf] [video] Policy Guided Planning in Learned Latent Space
Mohammad Amini (Mila, McGill University); Doina Precup (McGill University); Sarath Chandar (Mila)
Program Committee
We would like to thank the following people for their effort in making this year's edition of the Deep RL Workshop a success!
David Abel
Pulkit Agrawal
Maruan Al Shedivat
Marcin Andrychowicz
Dilip Arumgam
Glen Berseth
Diana Borsa
Ethan Brooks
Noam Brown
Roberto Calandra
Wilka Carvalho
Devendra Singh Chaplot
Veronica Chelu
Richard Chen
Jongwook Choi
Ignasi Clavera
Thomas Degris
Harri Edwards
Jess Farebrother
Jakob Foerster
Justin Fu
Yasuhiro Fujita
Shixiang Gu
Arthur Guez
Xiaoxiao Guo
Yijie Guo
Abhishek Gupta
David Ha
Tuomas Haarnoja
Danijar Hafner
Jessica Hamrick
Anna Harutyunyan
Karol Hausman
Rein Houthooft
Sandy Huang
Maximilian Igl
Riashat Islam
Max Jaderberg
Gregory Kahn
Khimya Khetarpal
Louis Kirsch
Ilya Kostrikov
Andrew Lampinen
Alex Lee
Lisa Lee
Ryan Lowe
Fangchen Liu
Qiyang Li
Rowan McAllister
Nikhil Mishra
Vlad Mnih
Aditya Modi
Igor Mordatch
Ofir Nachum
Anusha Nagabandi
Ashvin Nair
Suraj Nair
Karthik Narasimhan
Junhyuk Oh
Deepak Pathak
Xue Bin Peng
Lerrel Pinto
Vitchyr Pong
Aravind Rajeswaran
Sid Reddy
Oleh Rybkin
Tim Salimans
Tom Schaul
Pierre Sermanet
Rohin Shah
Archit Sharma
Max Smith
Sungryull Sohn
Aravind Srinivas
Bradly Stadie
Arthur Szlam
Aviv Tamar
Chen Tessler
Yuandong Tian
Sasha Vezhnevets
Risto Vuorio
Tony Wu
Yi Wu
Markus Wulfmeier
Ted Xiao
Zhongwen Xu
Huazhe Xu
Ge Yang
Dennis Yarats
Tianhe Yu
Tom Zahavy
Marvin Zhang
Shangtong Zhang
Qi Zhang
Amy Zhang
Zeyu Zheng
Allan Zhou
Luisa Zintgraf