Deep Reinforcement Learning Workshop

NeurIPS 2020


About

In recent years, the use of deep neural networks as function approximators has enabled researchers to extend reinforcement learning techniques to solve increasingly complex control tasks. The emerging field of deep reinforcement learning has led to remarkable empirical results in rich and varied domains like robotics, strategy games, and multi-agent interactions. This workshop will bring together researchers working at the intersection of deep learning and reinforcement learning, and it will help interested researchers outside of the field gain a high-level view about the current state of the art and potential directions for future contributions.

For previous editions, please visit NeurIPS 2019, 2018, 2017, 2016, 2015.

Submit your questions for panel discussions here: https://forms.gle/TE5du16CTWo6GXrQA

Schedule (December 11th 2020, 8:15am-7:00pm PST)

  • 08:15 - 08:30 Welcome and Introduction

  • 08:30 - 09:00 Pierre-Yves Oudeyer "Machines that invent their own problems: Towards open-ended learning of skills" [video]

  • 09:00 - 10:00 Contributed talks

    • 09:00 - 09:15 Sammy Christen "Learning Functionally Decomposed Hierarchies for Continuous Control Tasks with Path Planning" [video]

    • 09:15 - 09:30 Saikrishna Gottipati "Maximum Reward Formulation In Reinforcement Learning" [video]

    • 09:30 - 09:45 Karl Pertsch "Accelerating Reinforcement Learning with Learned Skill Priors" [video]

    • 09:45 - 10:00 Hyeonwoo Noh "Asymmetric self-play for automatic goal discovery in robotic manipulation" [video]

  • 10:00 - 10:30 Marc Bellemare "Autonomous navigation of stratospheric balloons using reinforcement learning" [live]

  • 10:30 - 11:00 Break

  • 11:00 - 11:30 Peter Stone "Grounded Simulation Learning for Sim2Real with Connections to Off-Policy Reinforcement Learning" [video]

  • 11:30 - 12:00 Contributed talks

    • 11:30 - 11:45 Manan Tomar "Mirror Descent Policy Optimization" [video]

    • 11:45 - 12:00 Keiran Paster "Planning from Pixels using Inverse Dynamics Models" [video]

  • 12:00 - 12:30 Matt Botvinick "Alchemy: A Benchmark Task Distribution for Meta-Reinforcement Learning Research" [video]

  • 12:30 - 13:30 Poster session 1 [gather.town] [poster locations]

  • 13:30 - 14:00 Susan Murphy "We used RL but…. Did it work?!" [video]

  • 14:00 - 14:30 Contributed talks

    • 14:00 - 14:15 Ben Eysenbach "MaxEnt RL and Robust Control" [video]

    • 14:15 - 14:30 Kevin Lu "Reset-Free Lifelong Learning with Skill-Space Planning" [video]

  • 14:30 - 15:00 Anusha Nagabandi "Model-based Deep Reinforcement Learning for Robotic Systems" [video]

  • 15:00 - 15:30 Break

  • 15:30 - 16:00 Ashley Edwards "Learning Offline from Observation" [video]

  • 16:00 - 16:30 NeurIPS RL Competitions Results Presentations

  • 16:30 - 17:00 Karen Liu "Deep Reinforcement Learning for Physical Human-Robot Interaction" [video]

  • 17:00 - 18:00 Panel discussion (moderator: Pieter Abbeel)

    • Marc Bellemare, Matt Botvinick, Ashley Edwards, Karen Liu, Susan Murphy, Anusha Nagabandi, Pierre-Yves Oudeyer, Peter Stone

    • Submit your questions to the panel here: https://forms.gle/TE5du16CTWo6GXrQA

  • 18:00 - 19:00 Poster session 2 [gather.town] [poster locations]

Important Dates and Deadlines


Invited Speakers

Marc Bellemare

Google Brain

Karen Liu

Stanford

Organizers

Pieter Abbeel

UC Berkeley / Covariant

Coline Devin

UC Berkeley

Chelsea Finn

Stanford / Google

Misha Laskin

UC Berkeley

Kimin Lee

UC Berkeley

Joelle Pineau

McGill / FAIR

Janarthanan Rajendran

University of Michigan

David Silver

DeepMind

Satinder Singh

University of Michigan / DeepMind

Vivek Veeriah

University of Michigan

Accepted Papers

  • [pdf] [video] Amortized Variational Deep Q Network

    • Haotian Zhang (Xi'an Jiaotong University); Yuhao Wang (Xi'an Jiaotong University); Jianyong Sun (Xi'an Jiaotong University)*; Zongben Xu (Xi'an Jiaotong University)

  • [pdf] [video] DREAM: Deep Regret minimization with Advantage baselines and Model-free learning

    • Eric Steinberger (Climate Science), Adam Lerer (Facebook AI Research), Noam Brown (Facebook AI Research)

  • [pdf] [supplementary material] [video] Learning Functionally Decomposed Hierarchies for Continuous Control Tasks with Path Planning

    • Sammy Christen (ETH Zurich); Lukas Jendele (ETH Zurich); Emre Aksan (ETH Zurich); Otmar Hilliges (ETH Zurich)

  • [pdf] [video] Safety Aware Reinforcement Learning

    • Santiago Miret (Intel Labs); Somdeb Majumdar (Intel Labs); Carroll Wainwright (Partnership on AI)

  • [pdf] [video] PixL2R: Guiding Reinforcement Learning Using Natural Language by Mapping Pixels to Rewards

    • Prasoon Goyal (UT Austin); Scott Niekum (UT Austin); Raymond Mooney (UT Austin)

  • [pdf] [video] Asymmetric self-play for automatic goal discovery in robotic manipulation

    • OpenAI (OpenAI); Matthias Plappert (OpenAI); Raul Sampedro (OpenAI); Tao Xu (OpenAI); Ilge Akkaya (OpenAI); Vineet Kosaraju (OpenAI); Peter Welinder (OpenAI); Ruben D'Sa (OpenAI); Arthur Petron (OpenAI); Henrique Ponde (OpenAI); Alex Paino (OpenAI); Hyeonwoo Noh (OpenAI); Lilian Weng (OpenAI)*; Qiming Yuan (OpenAI); Casey Chu (OpenAI); Wojciech Zaremba (OpenAI)

  • [pdf] [video] Sample Efficient Training in Multi-Agent Adversarial Games with Limited Teammate Communication

    • Hardik Meisheri (TCS Research); Harshad Khadilkar (TCS Research)

  • [pdf] [supplementary material] [video] Multi-task Reinforcement Learning with a Planning Quasi-Metric

    • Vincent Micheli (EPFL); Karthigan Sinnathamby (EPFL); François Fleuret (University of Geneva)

  • [pdf] [video] Disentangled Planning and Control in Vision Based Robotics via Reward Machines

    • Alberto Camacho (Google); Jacob Varley (Google); Andy Zeng (Google); Deepali Jain (Google); Atil Iscen (Google); Dmitry Kalashnikov (Google)

  • [pdf] [supplementary material] [video] Maximum Mutation Reinforcement Learning for Scalable Control

    • Karush Suri (University of Toronto); Xiao Qi Shi (RBC Capital Markets); Konstantinos Plataniotis (University of Toronto); Yuri Lawryshyn (University of Toronto)

  • [pdf] [supplementary material] [video] Energy-based Surprise Minimization for Multi-Agent Value Factorization

    • Karush Suri (University of Toronto); Xiao Qi Shi (RBC Capital Markets); Konstantinos Plataniotis (University of Toronto); Yuri Lawryshyn (University of Toronto)

  • [pdf] [supplementary material] [video] Correcting Momentum in Temporal Difference Learning

    • Emmanuel Bengio (McGill University); Joelle Pineau (McGill / Facebook); Doina Precup (McGill University)

  • [pdf] [supplementary material] [video] A Policy Gradient Method for Task-Agnostic Exploration

    • Mirco Mutti (Politecnico di Milano, Università di Bologna); Lorenzo Pratissoli (Politecnico di Milano); Marcello Restelli (Politecnico di Milano)

  • [pdf] [video] Dream and Search to Control: Latent Space Planning for Continuous Control

    • Anurag Koul (Oregon State University); Varun Kumar (Intel AI Lab); Alan Fern (Oregon State University); Somdeb Majumdar (Intel Labs)

  • [pdf] [video] Unsupervised Task Clustering for Multi-Task Reinforcement Learning

    • Johannes Ackermann (Technical University of Munich); Oliver Richter (ETH Zurich); Roger Wattenhofer (ETH Zurich)

  • [pdf] [video] Learning Intrinsic Symbolic Rewards in Reinforcement Learning

    • Hassam Sheikh (University of Central Florida); Shauharda Khadka (Oregon State University); Santiago Miret (Intel Labs); Somdeb Majumdar (Intel Labs)

  • [pdf] [video] Preventing Value Function Collapse in Ensemble Q-Learning by Maximizing Representation Diversity

    • Hassam Sheikh (University of Central Florida); Ladislau Boloni (University of Central Florida)

  • [pdf] [video] Quantifying Differences in Reward Functions

    • Adam Gleave (UC Berkeley); Michael Dennis (UC Berkeley); Shane Legg (); Stuart Russell (UC Berkeley); Jan Leike (DeepMind)

  • [pdf] [video] Online Hyper-parameter Tuning in Off-policy Learning via Evolutionary Strategies

    • Yunhao Tang (Columbia University); Krzysztof Choromanski (Google Brain Robotics)

  • [pdf] [video] DERAIL: Diagnostic Environments for Reward And Imitation Learning

    • Pedro Freire (Ecole Polytechnique); Adam Gleave (UC Berkeley); Sam Toyer (UC Berkeley); Stuart Russell (UC Berkeley)

  • [pdf] [video] Exploring Zero-Shot Emergent Communication in Embodied Multi-Agent Populations

    • Kalesha Bullard (Facebook AI Research); Franziska Meier (Facebook AI Research); Douwe Kiela (Facebook AI Research); Joelle Pineau (Facebook); Jakob Foerster (Facebook)

  • [pdf] [video] On Effective Parallelization of Monte Carlo Tree Search

    • Anji Liu (UCLA); Yitao Liang (UCLA); Ji Liu (Kwai Inc.); Guy Van den Broeck (UCLA); Jianshu Chen (Tencent AI Lab)

  • [pdf] [video] Unlocking the Potential of Deep Counterfactual Value Networks

    • Ryan Zarick (Minimal AI); Bryan Pellegrino (Minimal AI); Noam Brown (Facebook AI Research); Caleb Banister (Minimal AI)

  • [pdf] [video] FactoredRL: Leveraging Factored Graphs for Deep Reinforcement Learning

    • Bharathan Balaji (Amazon); Petros Christodoulou (Amazon); Xiaoyu Lu (Amazon); Byungsoo Jeon (Amazon); Jordan Bell-Masterson (Amazon)

  • [pdf] [video] Reusability and Transferability of Macro Actions for Reinforcement Learning

    • Yi Hsiang Chang (National Tsing Hua University); Kuan-Yu Chang (National Tsing Hua University); Henry Kuo (Harvard University); Chun-Yi Lee (National Tsing Hua University)

  • [pdf] [video] Decoupling Exploration and Exploitation in Meta-Reinforcement Learning without Sacrifices

    • Evan Liu (Stanford University); Aditi Raghunathan (Stanford University); Percy Liang (Stanford University); Chelsea Finn (Stanford)

  • [pdf] Mastering Atari with Discrete World Models

    • Danijar Hafner (Google); Timothy Lillicrap (DeepMind); Mohammad Norouzi (Google Research, Brain Team); Jimmy Ba (University of Toronto)

  • [pdf] [video] Action and Perception as Divergence Minimization

    • Danijar Hafner (Google); Pedro Ortega (DeepMind); Jimmy Ba (University of Toronto); Thomas Parr (University College London); Karl Friston (University College London); Nicolas Heess (DeepMind)

  • [pdf] [video] Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning

    • Rishabh Agarwal (Google Research, Brain Team); Marlos C. Machado (Google Brain); Pablo Samuel Castro (Google); Marc G. Bellemare (Google Brain)

  • [pdf] [video] Skill Transfer via Partially Amortized Hierarchical Planning

    • Kevin Xie (University of Toronto); Homanga Bharadhwaj (University of Toronto, Vector Institute); Danijar Hafner (Google); Animesh Garg (University of Toronto, Vector Institute, Nvidia); Florian Shkurti (University of Toronto)

  • [pdf] [video] Average Reward Reinforcement Learning with Monotonic Policy Improvement

    • Yiming Zhang (New York University); Keith Ross (New York University Shanghai)

  • [pdf] [video] Randomized Ensembled Double Q-Learning: Learning Fast Without a Model

    • Xinyue Chen (NYU Shanghai); Che Wang (New York University); Zijian Zhou (NYU Shanghai); Keith Ross (New York University Shanghai)

  • [pdf] [video] Combating False Negatives in Adversarial Imitation Learning

    • Konrad Żołna (Jagiellonian University); Chitwan Saharia (Indian Institute of Technology, Bombay); Léonard Boussioux (MIT, CentraleSupélec); David Yu-Tung Hui (Mila); Maxime Chevalier-Boisvert (Mila, Université de Montréal); Dzmitry Bahdanau (Element AI); Yoshua Bengio (Mila)

  • [pdf] [video] Evaluating Agents Without Rewards

    • Brendon Matusch; Jimmy Ba (University of Toronto); Danijar Hafner (Google)

  • [pdf] [video] World Model as a Graph: Learning Latent Landmarks for Planning

    • Lunjun Zhang (University of Toronto); Ge Yang (University of Chicago); Bradly Stadie (Vector Institute)

  • [pdf] [video] Interactive Visualization for Debugging RL

    • Shuby Deshpande (Carnegie Mellon University); Ben Eysenbach (Carnegie Mellon University); Jeff Schneider

  • [pdf] [video] Conservative Safety Critics for Exploration

    • Homanga Bharadhwaj (University of Toronto, Vector Institute); Aviral Kumar (UC Berkeley); Nicholas Rhinehart (UC Berkeley); Sergey Levine (UC Berkeley); Florian Shkurti (University of Toronto); Animesh Garg (University of Toronto, Vector Institute, Nvidia)

  • [pdf] [video] D2RL: Deep Dense Architectures in Reinforcement Learning

    • Samarth Sinha (University of Toronto, Vector Institute); Homanga Bharadhwaj (University of Toronto, Vector Institute); Aravind Srinivas (UC Berkeley); Animesh Garg (University of Toronto, Vector Institute, Nvidia)

  • [pdf] [video] Weighted Bellman Backups for Improved Signal-to-Noise in Q-Updates

    • Kimin Lee (UC Berkeley); Michael Laskin (UC Berkeley); Aravind Srinivas (UC Berkeley); Pieter Abbeel (UC Berkeley)

  • [pdf] [video] Benchmarking Multi-Agent Deep Reinforcement Learning Algorithms

    • Chao Yu (Tsinghua University); Akash Velu (UC Berkeley); Eugene Vinitsky (UC Berkeley); Yu Wang (tsinghua university); Alexandre Bayen (University of California, Berkeley); Yi Wu (OpenAI)

  • [pdf] [supplementary material] [video] A Deep Value-based Policy Search Approach for Real-world Vehicle Repositioning on Mobility-on-Demand Platforms

    • Yan Jiao (Didi Research America); Xiaocheng Tang (DiDi AI Labs); ZHIWEI QIN (Didi Research America); Shuaiji Li (DiDi AI Labs); Fan Zhang (DiDi AI Labs); Hongtu Zhu (AI Labs, Didi Chuxing); Jieping Ye (Didi Chuxing)

  • [pdf] [video] Solving Compositional Reinforcement Learning Problems via Task Reduction

    • Yunfei Li (Tsinghua University); Huazhe Xu (UC Berkeley); Yilin Wu (Shanghai Qi Zhi Institute); Xiaolong Wang (UCSD); Yi Wu (OpenAI)

  • [pdf] [video] Discovering Diverse Multi-Agent Strategic Behavior via Reward Randomization

    • Zhenggang Tang (Peking University); Chao Yu (Tsinghua University); Boyuan Chen (UC Berkeley); Huazhe Xu (UC Berkeley); Xiaolong Wang (UCSD); Fei Fang (Carnegie Mellon University); Simon Du (University of Washington); Yu Wang (tsinghua university); Yi Wu (OpenAI)

  • [pdf] [video] FinRL: A Deep Reinforcement Learning Library for Automated Stock Trading in Quantitative Finance

    • Xiao-Yang Liu (Columbia University); Hongyang Yang (Columbia University); Qian Chen (Columbia University); Runjia Zhang (AI4Finance LLC); Liuqing Yang (Columbia University); Bowen Xiao (Imperial College); Christina Dan Wang (New York University)

  • [pdf] [video] What Matters for On-Policy Deep Actor-Critic Methods? A Large-Scale Study

    • Marcin Andrychowicz (Google); Anton Raichuk (Google); Piotr Stańczyk (Google Brain); Manu Orsini (Google Brain); Sertan Girgin (Google Brain); Raphael Marinier (Google); Léonard Hussenot (Google Research, Brain Team); Matthieu Geist (Google Brain); Olivier Pietquin (Google Research - Brain Team); Marcin Michalski (Google); Sylvain Gelly (Google Brain); Olivier Bachem (Google Brain)

  • [pdf] [video] Semantic State Representation for Reinforcement Learning

    • Erez Schwartz (Technion); Guy Tennenholtz (Technion); Chen Tessler (Technion); Shie Mannor (Technion)

  • [pdf] [video] Deep Q-Learning with Low Switching Cost

    • Shusheng Xu (Tsinghua University); Simon Du (University of Washington); Yi Wu (OpenAI)

  • [pdf] [supplementary material] [video] Diverse Exploration via InfoMax Options

    • Yuji Kanagawa (The University of Tokyo); Tomoyuki Kaneko (The University of Tokyo)

  • [pdf] [video] Hyperparameter Auto-tuning in Self-Supervised Robotic Learning

    • Jiancong Huang (Guangdong University of Technology)

  • [pdf] [video] Learning to Represent Action Values as a Hypergraph on the Action Vertices

    • Arash Tavakoli (Imperial College London); Mehdi Fatemi (Microsoft Research); Petar Kormushev (Imperial College London)

  • [pdf] [video] Explanation Augmented Feedback in Human-in-the-Loop Reinforcement Learning

    • Lin Guan (Arizona State University); Mudit Verma (Arizona State University); Sihang Guo (University of Texas at Austin); Ruohan Zhang (University of Texas at Austin); Subbarao Kambhampati (Arizona State University)

  • [pdf] [video] Goal-Conditioned Reinforcement Learning in the Presence of an Adversary

    • Carlos Purves (University of Cambridge); Pietro Liò (University of Cambridge); Cătălina Cangea (University of Cambridge)

  • [pdf] [supplementary material] [video] Regularized Inverse Reinforcement Learning

    • Wonseok Jeon (MILA, McGill University); Chen-Yang Su (MILA, McGill University); Paul Barde (MILA, McGill University); Thang Doan (Mila / McGill University); Derek Nowrouzezahrai (McGill University); Joelle Pineau (McGill / Facebook)

  • [pdf] [video] Planning from Pixels using Inverse Dynamics Models

    • Keiran Paster (University of Toronto); Sheila McIlraith (University of Toronto); Jimmy Ba (University of Toronto)

  • [pdf] [video] Visual Imitation with Reinforcement Learning using Recurrent Siamese Networks

    • Glen Berseth (University of California Berkeley); Florian Golemo (Mila, ElementAI); Chris Pal (MILA, Polytechnique Montréal, Element AI)

  • [pdf] [video] Learning Accurate Long-term Dynamics for Model-based Reinforcement Learning

    • Nathan Lambert (UC Berkeley); Albert Wilcox (UC Berkeley); Howard Zhang (UC Berkeley); Kristofer Pister (UC Berkeley); Roberto Calandra (Facebook)

  • [pdf] [video] XLVIN: eXecuted Latent Value Iteration Nets

    • Andreea Deac (Mila, Université de Montréal); Petar Veličković (DeepMind); Ognjen Milinković (University of Belgrade); Pierre-Luc Bacon (Mila); Jian Tang (U Montreal); Mladen Nikolic (University of Belgrade)

  • [pdf] Model-Based Meta-Reinforcement Learning for Flight with Suspended Payloads

    • Suneel Belkhale (UC Berkeley); Rachel Li (University of California, Berkeley); Gregory Kahn (UC Berkeley); Rowan McAllister (UC Berkeley); Roberto Calandra (UC Berkeley)

  • [pdf] [supplementary material] [video] Targeted Query-based Action-Space Adversarial Policies on Deep Reinforcement Learning Agents

    • Xian Yeow Lee (Iowa State University); Yasaman Esfandiari (Iowa State University); Kai Liang Tan (Iowa State University); Soumik Sarkar (Iowa State University)

  • [pdf] [video] Parrot: Data-driven Behavioral Priors for Reinforcement Learning

    • Avi Singh (UC Berkeley); Huihan Liu (UC Berkeley ); Gaoyue Zhou (University of California, Berkeley); Albert Yu (UC Berkeley); Nick Rhinehart (); Sergey Levine (UC Berkeley)

  • [pdf] [video] Addressing Distribution Shift in Online Reinforcement Learning with Offline Datasets

    • Seunghyun Lee (KAIST); Younggyo Seo (KAIST); Kimin Lee (UC Berkeley); Pieter Abbeel (UC Berkeley); Jinwoo Shin (KAIST)

  • [pdf] [video] C-Learning: Horizon-Aware Cumulative Accessibility Estimation

    • Panteha Naderian (Layer 6 AI); Gabriel Loaiza-Ganem (Layer 6 AI); Harry Braviner (Layer 6 AI); Anthony Caterini (Layer 6 AI); Jesse Cresswell (Layer 6 AI); Tong Li (Layer 6 AI); Animesh Garg (University of Toronto, Vector Institute, Nvidia)

  • [pdf] [video] Abstract Value Iteration for Hierarchical Deep Reinforcement Learning

    • Kishor Jothimurugan (University of Pennsylvania); Osbert Bastani (University of Pennysylvania); Rajeev Alur (University of Pennsylvania )

  • [pdf] [video] Implicit Under-Parameterization Inhibits Data-Efficient Deep Reinforcement Learning

    • Aviral Kumar (UC Berkeley); Rishabh Agarwal (Google Research, Brain Team); Dibya Ghosh (UC Berkeley); Sergey Levine (UC Berkeley)

  • [pdf] [video] Beyond Exponentially Discounted Sum: Automatic Learning of Return Function

    • Yufei Wang (Carnegie Mellon University); Qiwei Ye (Microsoft); Tie-Yan Liu (Microsoft)

  • [pdf] [video] TACTO: A Simulator for Learning Control from Touch Sensing

    • Shaoxiong Wang (MIT); Mike Lambeta (Facebook); Po-Wei Chou (Facebook); Roberto Calandra (Facebook)

  • [pdf] [video] XT2: Training an X-to-Text Typing Interface with Online Learning from Implicit Feedback

    • Jensen Gao (UC Berkeley); Siddharth Reddy (UC Berkeley); Glen Berseth (University of California Berkeley); Anca Dragan (EECS Department, University of California, Berkeley); Sergey Levine (UC Berkeley)

  • [pdf] [video] Safe Reinforcement Learning with Natural Language Constraints

    • Tsung-Yen Yang (Princeton University); Michael Hu (Princeton University); Yinlam Chow (Google AI); Peter Ramadge (Princeton); Karthik Narasimhan (Princeton University)

  • [pdf] [video] Compute- and Memory-Efficient Reinforcement Learning with Latent Experience Replay

    • Lili Chen (UC Berkeley); Kimin Lee (UC Berkeley); Aravind Srinivas (); Pieter Abbeel (UC Berkeley)

  • [pdf] [video] Shortest-Path Constrained Reinforcement Learning for Sparse Reward Tasks

    • Sungryull Sohn (University of Michigan); Sungtae Lee (Yonsei University); Jongwook Choi (University of Michigan); Honglak Lee (University of Michingan / Google Research); Harm van Seijen (Microsoft); Mehdi Fatemi (Microsoft Research)

  • [pdf ] [video] Greedy Multi-Step Off-Policy Reinforcement Learning

    • Yuhui Wang (Nanjing University of Aeronautics and Astronautics, China); Xiaoyang Tan (Nanjing University of Aeronautics and Astronautics, China)

  • [pdf] [video] OPAL: Offline Primitive Discovery for Accelerating Offline Reinforcement Learning

    • Anurag Ajay (MIT); Aviral Kumar (UC Berkeley); Pulkit Agrawal (UC Berkeley); Sergey Levine (Google); Ofir Nachum (Google)

  • [pdf] [video] Emergent Road Rules In Multi-Agent Driving Environments

    • Avik Pal (Indian Institute of Technology Kanpur); Jonah Philion (University of Toronto, NVIDIA); Yuan-Hong Liao (University of Toronto); Sanja Fidler (University of Toronto, NVIDIA)

  • [pdf] [supplementary material] [video] Modularity in Reinforcement Learning: An Algorithmic Causality Perspective on Credit Assignment

    • Michael Chang (UC Berkeley)*; Sid Kaushik (UC Berkeley)*; Sergey Levine (UC Berkeley); Tom Griffiths (Princeton)

  • [pdf] [video] An Examination of Preference-based Reinforcement Learning for Treatment Recommendation

    • Nan Xu (Univeristy of Southern California); Nitin Kamra (University of Southern California); Yan Liu (USC)

  • [pdf] [supplementary material] [video] Learning to Weight Imperfect Demonstrations

    • Yunke Wang (Wuhan University); Chang Xu (University of Sydney); Bo Du (Wuhan University); Honglak Lee (University of Michingan / Google Research)

  • [pdf] [supplementary material] [video] Structure and randomness in planning and reinforcement learning

    • Piotr Kozakowski (University of Warsaw); Piotr Januszewski (University of Warsaw & Gdansk University of Technology); Konrad Czechowski (University of Warsaw); Łukasz Kuciński (Polish Academy of Sciences); Piotr Miłoś (Polish Academy of Sciences)

  • [pdf] [video] Variational Empowerment as Representation Learning for Goal-Based Reinforcement Learning

    • Jongwook Choi (Google); Archit Sharma (Google); Sergey Levine (Google); Honglak Lee (Google / U. Michigan); Shixiang Gu (Google Brain)

  • [pdf] [video] Robust Domain Randomised Reinforcement Learning through Peer-to-Peer Distillation

    • Chenyang Zhao (University of Edinburgh); Timothy Hospedales (Edinburgh University)

  • [pdf] [video] Data-Efficient Reinforcement Learning with Self-Predictive Representations

    • Max Schwarzer (Mila, Université de Montréal); Ankesh Anand (MILA); Rishab Goel (Mila); R Devon Hjelm (Microsoft Research); Aaron Courville (Universite de Montreal); Philip Bachman (Microsoft Research)

  • [pdf] [video] Accelerating Reinforcement Learning with Learned Skill Priors

    • Karl Pertsch (University of Southern California); Youngwoon Lee (University of Southern California); Joseph Lim (USC)

  • [pdf] [video] Model-based Navigation in Environments with Novel Layouts Using Abstract n-D Maps

    • Linfeng Zhao (Northeastern University); Lawson Wong (Northeastern University)

  • [pdf] [video] Parameter-based Value Functions

    • Francesco Faccio (The Swiss AI Lab IDSIA); Louis Kirsch (Swiss AI Lab IDSIA); Jürgen Schmidhuber (IDSIA - Lugano)

  • [pdf] [video] Online Safety Assurance for Deep Reinforcement Learning

    • Noga Rotman (Hebrew University of Jerusalem); Michael Schapira (Hebrew University); Aviv Tamar (UC Berkeley)

  • [pdf] [video] Lyapunov Barrier Policy Optimization

    • Harshit Sikchi (Carnegie Mellon University); Wenxuan Zhou (Carnegie Mellon University); David Held (CMU)

  • [pdf] [video] C-Learning: Learning to Achieve Goals via Recursive Classification

    • Ben Eysenbach (Carnegie Mellon University); Ruslan Salakhutdinov (Carnegie Mellon University); Sergey Levine (UC Berkeley)

  • [pdf] [video] Off-Dynamics Reinforcement Learning: Training for Transfer with Domain Classifiers

    • Ben Eysenbach (Carnegie Mellon University); Shreyas Chaudhari (Carnegie Mellon University); Swapnil Asawa (University of Pittsburgh); Ruslan Salakhutdinov (Carnegie Mellon University); Sergey Levine (UC Berkeley)

  • [pdf] [supplementary material] [video] Influence-aware Memory for Deep Reinforcement Learning in POMDPs

    • Miguel Suau (Delft University of Technology); Elena Congeduti (Delft University of Technology); Jinke He (Delft University of Technology); Rolf Starre (Delft University of Technology); Aleksander Czechowski (TU Delft); Frans Oliehoek (TU Delft)

  • [pdf] [video] Multi-Robot Deep Reinforcement Learning via Hierarchically Integrated Models

    • Katie Kang (UC Berkeley); Gregory Kahn (UC Berkeley); Sergey Levine (University of California, Berkeley)

  • [pdf] [video] ReaPER: Improving Sample Efficiency in Model-Based Latent Imagination

    • Martin Bertran (Duke University); Guillermo Sapiro (Duke University); Mariano Phielipp (Intel AI Lab)

  • [pdf] [video] Maximum Reward Formulation In Reinforcement Learning

    • Sai Krishna Gottipati (99andBeyond); Yashaswi Pathak (International Institute of Information Technology,Hyderabad); Rohan Nuttall (University of Alberta); . Sahir (University of Alberta); Raviteja Chunduru (McGill University); Ahmed Touati (MILA); Sriram Ganapathi Subramanian (University of Waterloo ); Matthew Taylor (U. of Alberta); Sarath Chandar (Mila)

  • [pdf] [video] How to make Deep RL work in Practice

    • Nirnai Rao (Technical University of Munich); Elie Aljalbout (Technical University of Munich); Axel Sauer (University of Tuebingen); Sami Haddadin (Technical University of Munich)

  • [pdf] [supplementary material] [video] Super-Human Performance in Gran Turismo Sport Using Deep Reinforcement Learning

    • Florian Fuchs (Sony); Yunlong Song (ETH / University of Zurich); Elia Kaufmann (ETH / University of Zurich); Davide Scaramuzza (University of Zurich & ETH Zurich, Switzerland); Peter Dürr (Sony Europe)

  • [pdf] [video] Model-Based Reinforcement Learning: A Compressed Survey

    • Thomas Moerland (Delft University of Technology); Joost Broekens (Leiden University); Catholijn Jonker (Delft University of Technology)

  • [pdf] [video] Evolving Reinforcement Learning Algorithms

    • John Co-Reyes (UC Berkeley); Yingjie Miao (Google); Daiyi Peng (Google Brain); Quoc Le (Google Brain); Sergey Levine (Google); Honglak Lee (Google / U. Michigan); Aleksandra Faust (Google Brain)

  • [pdf] [video] Learning to Reach Goals via Iterated Supervised Learning

    • Dibya Ghosh (UC Berkeley); Abhishek Gupta (UC Berkeley); Ashwin Reddy (UC Berkeley); Justin Fu (UC Berkeley); Coline Devin (University of California, Berkeley); Ben Eysenbach (Carnegie Mellon University); Sergey Levine (UC Berkeley)

  • [pdf] [video] Which Mutual-Information Representation Learning Objectives are Sufficient for Control?

    • Kate Rakelly (UC Berkeley); Abhishek Gupta (UC Berkeley); Carlos Florensa (UC Berkeley); Sergey Levine (UC Berkeley)

  • [pdf] [video] BeBold: Exploration Beyond the Boundary of Explored Regions

    • Tianjun Zhang (UC Berkeley); Huazhe Xu (UC Berkeley); Xiaolong Wang (UCSD); Yi Wu (OpenAI); Kurt Keutzer (EECS, UC Berkeley); Joseph Gonzalez (UC Berkeley); Yuandong Tian (Facebook)

  • [pdf] [supplementary material] [video] Curriculum Learning through Distilled Discriminators

    • Rahul Siripurapu (USI); Louis Kirsch (Swiss AI Lab IDSIA); Jürgen Schmidhuber (IDSIA - Lugano)

  • [pdf] [video] Chaining Behaviors from Data with Model-Free Reinforcement Learning

    • Avi Singh (UC Berkeley); Albert Yu (UC Berkeley); Jonathan Yang (UC Berkeley); Aviral Kumar (UC Berkeley); Jesse Zhang (UC Berkeley); Sergey Levine (UC Berkeley)

  • [pdf] [supplementary material] [video] Self-Supervised Policy Adaptation during Deployment

    • Nicklas Hansen (Technical University of Denmark); Rishabh Jangir (University of California San Diego); Yu Sun (); Guillem Alenyà (IRI); Pieter Abbeel (UC Berkeley); Alexei Efros (UC Berkeley); Lerrel Pinto (New York University); Xiaolong Wang (UCSD)

  • [pdf] [supplementary material] [video] Trust, but verify: model-based exploration in sparse reward environments

    • Konrad Czechowski (University of Warsaw); TOMASZ ODRZYGÓŹDŹ (Polish Academy of Sciences); Michał Izworski (University of Warsaw); Marek Zbysiński (University of Warsaw); Lukasz Kucinski (IMPAN); Piotr Miłoś (Polish Academy of Sciences)

  • [pdf] [video] Model-Based Visual Planning with Self-Supervised Functional Distances

    • Stephen Tian (UC Berkeley); Suraj Nair (Stanford University); Frederik Ebert (UC Berkeley); Sudeep Dasari (Carnegie Mellon University); Ben Eysenbach (Carnegie Mellon University); Chelsea Finn (Stanford); Sergey Levine (UC Berkeley)

  • [pdf] [video] A Unified View of Inference-based Off-Policy RL: Decoupling Algorithmic and Implementational Sources of Performance Differences

    • Hiroki Furuta (The University of Tokyo); Tadashi Kozuno (Okinawa Institute of Science and Technology); Tatsuya Matsuhima (The University of Tokyo); Yutaka Matsuo (The University of Tokyo); Shixiang Gu (Google Brain)

  • [pdf] [supplementary material] [video] Pairwise Weights for Temporal Credit Assignment

    • Zeyu Zheng (University of Michigan); Risto Vuorio (University of Oxford); Richard Lewis (University of Michigan); Satinder Singh (UMich)

  • [pdf] [video] Learning to Sample with Local and Global Contexts in Experience Replay Buffer

    • Youngmin Oh (Samsung Advanced Institute of Technology); Kimin Lee (UC Berkeley); Jinwoo Shin (KAIST); Eunho Yang (KAIST;AITRICS); Sung Ju Hwang (KAIST, AITRICS)

  • [pdf] [video] Adversarial Environment Generation for Learning to Navigate the Web

    • Izzeddin Gur (Google); Natasha Jaques (UC Berkeley); Kevin Malta (Google); Manoj Tiwari (Google); Aleksandra Faust (Google Brain); Honglak Lee (Google / U. Michigan); Aleksandra Faust (Google Brain);

  • [pdf] [video] Optimizing Memory Placement using Evolutionary Graph Reinforcement Learning

    • Shauharda Khadka (Intel Labs); Estelle Guez Aflalo (Intel Corp); Mattias Marder (Intel Corp); Avrech Ben-David (Technion); Santiago Miret (Intel Labs); Shie Mannor (Technion); Tamir Hazan (Technion); Hanlin Tang (Intel Corporation); Somdeb Majumdar (Intel Labs)*

  • [pdf] [video] Causal Curiosity: RL Agents Discovering Self-supervised Experiments for Causal Representation Learning

    • Sumedh Sontakke (University of Southern California); Arash Mehrjou (Mr.); Laurent Itti (University of Southern California); Bernhard Schölkopf (MPI for Intelligent Systems, Tübingen)

  • [pdf] [video] Optimizing Traffic Bottleneck Throughput using Cooperative, Decentralized Autonomous Vehicles

    • Eugene Vinitsky (UC Berkeley); Nathan Lichtle (ENS Paris-Saclay); Kanaad Parvate (UC Berkeley); Alexandre Bayen (University of California, Berkeley)

  • [pdf] [video] Reset-Free Lifelong Learning with Skill-Space Planning

    • Kevin Lu (UC Berkeley); Aditya Grover (Stanford University); Pieter Abbeel (UC Berkeley); Igor Mordatch (Google)

  • [pdf] [video] Mirror Descent Policy Optimization

    • Manan Tomar (Facebook AI Research); Lior Shani (Technion); Yonathan Efroni (Microsoft Research); Mohammad Ghavamzadeh (Google Research)

  • [pdf] [video] Utilizing Skipped Frames in Action Repeats via Pseudo-Actions

    • Taisei Hashimoto (The University of Tokyo); Yoshimasa Tsuruoka (The University of Tokyo)

  • [pdf] [video] Tonic: A Deep Reinforcement Learning Library for Fast Prototyping and Benchmarking

    • Fabio Pardo (Imperial College London)

  • [pdf] [video] Revisiting Rainbow: Promoting more insightful and inclusive deep reinforcement learning research

    • Johan Obando Ceron (UAO); Pablo Samuel Castro (Google)

  • [pdf] [video] MaxEnt RL and Robust Control

    • Ben Eysenbach (Carnegie Mellon University); Sergey Levine (UC Berkeley)

  • [pdf] Bringing order into Actor-Critic Algorithms using Stackelberg Games

    • Robert Müller (Technical University of Munich)

  • [pdf] [video] Reinforcement Learning with Latent Flow

    • Wenling Shang (University of Amsterdam); Xiaofei Wang (University of California, Berkeley); Aravind Rajeswaran (University of Washington); Aravind Srinivas (UC Berkeley)*; Yang Gao (UC Berkeley); Michael Laskin (UC Berkeley)

  • [pdf] [video] Understanding Learned Reward Functions

    • Eric Michaud (University of California, Berkeley); Adam Gleave (University of California, Berkeley); Stuart Russell (UC Berkeley)

  • [pdf] [video] Addressing reward bias in Adversarial Imitation Learning with neutral reward functions

    • Rohit Jena (Carnegie Mellon University); Siddharth Agrawal (Carnegie Mellon University); Katia Sycara (Carnegie Mellon University)

  • [pdf] [video] Reinforcement Learning for Sparse-Reward Object-Interaction Tasks in a First-person Simulated 3D Environments

    • Wilka Carvalho (University of Michigan); Anthony Liang (University of Michigan); Kimin Lee (UC Berkeley); Sungryull Sohn (University of Michigan); Honglak Lee (University of Michingan / Google Research); Richard Lewis (University of Michigan); Satinder Singh (UMich)

  • [pdf] [supplementary material] [video] Efficient Competitive Self-Play Policy Optimization

    • Yuanyi Zhong (University of Illinois at Urbana-Champaign); Yuan Zhou (UIUC); Jian Peng (University of Illinois at Urbana-Champaign)

  • [pdf] [video] Autoregressive Dynamics Models for Offline Policy Evaluation and Optimization

    • Michael Zhang (University of Toronto ); Thomas Paine (DeepMind); Ofir Nachum (Google); Cosmin Paduraru (DeepMind); George Tucker (Google Brain); Ziyu Wang (Google Research, Brain Team); Mohammad Norouzi (Google Research, Brain Team)

  • [pdf] [video] Reinforcement Learning with Bayesian Classifiers: Efficient Skill Learning from Outcome Examples

    • Kevin Li (UC Berkeley); Abhishek Gupta (UC Berkeley); Vitchyr Pong (UC Berkeley); Ashwin Reddy (UC Berkeley); Aurick Zhou (UC Berkeley); Justin Yu (RAIL); Sergey Levine (UC Berkeley)

  • [pdf] Decoupling Representation Learning from Reinforcement Learning

    • Adam Stooke (UC Berkeley); Kimin Lee (UC Berkeley); Michael Laskin (UC Berkeley)

  • [pdf] [video] AWAC: Accelerating Online Reinforcement Learning With Offline Datasets

    • Ashvin Nair (UC Berkeley); Murtaza Dalal (Carnegie Mellon University); Abhishek Gupta (UC Berkeley); Sergey Levine (UC Berkeley)

  • [pdf] [video] Inter-Level Cooperation in Hierarchical Reinforcement Learning

    • Abdul Rahman Kreidieh (UC Berkeley); Glen Berseth (University of California Berkeley); Brandon Trabucco (UC Berkeley); Samyak Parajuli (University of California, Berkeley); Sergey Levine (UC Berkeley); Alexandre Bayen (UC Berkeley)

  • [pdf] [video] Model-Based Reinforcement Learning via Latent-Space Collocation

    • Oleh Rybkin (University of Pennsylvania); Chuning Zhu (University of Pennsylvania); Anusha Nagabandi (UC Berkeley); Kostas Daniilidis (University of Pennsylvania); Igor Mordatch (OpenAI); Sergey Levine (University of California, Berkeley)

  • [pdf] [video] Towards Effective Context for Meta-Reinforcement Learning: an Approach based on Contrastive Learning

    • Haotian Fu (Tianjin University); Hongyao Tang (Tianjin University); Jianye Hao (Huawei Noah's Ark Lab); Chen Chen (Huawei Noah’s Ark Lab); XIDONG FENG (Department of Automation,Tsinghua University; Huawei Noah ark's Lab); Dong Li ( Huawei Noah's Ark Lab); Wulong Liu (Huawei Noah's Ark Lab)

  • [pdf] [video] PettingZoo: Gym for Multi-Agent Reinforcement Learning

    • Justin Terry (University of Maryland, College Park); Benjamin Black (UMD); Mario Jayakumar (University of Maryland, College Park); Ananth Hari (University of Maryland, College Park); Luis Santos (University of Maryland, College Park); Clemens Dieffendahl (Technical University of Berlin); Niall Williams (University of Maryland, College Park); Yashas Lokesh (University of Maryland, College Park); Caroline Horsch ( University of Maryland, College Park); Praveen Ravi (University of Maryland, College Park); Ryan Sullivan (University of Maryland, College Park)

  • [pdf] [video] DisCo RL: Distribution-Conditioned Reinforcement Learning for General-Purpose Policies

    • Soroush Nasiriany (UC Berkeley); Vitchyr Pong (UC Berkeley); Ashvin Nair (UC Berkeley); Khazatsky Alexander (UC Berkeley); Glen Berseth (UC Berkeley); Sergey Levine (UC Berkeley)

  • [pdf] Multi-Agent Option Critic Architecture

    • Abhinav Gupta (Mila); Jhelum Chakravorty (McGill University); Jikun Kang (McGill University); Xue Liu (McGill University); Doina Precup (McGill University)

  • [pdf] [video] Measuring Visual Generalization in Continuous Control from Pixels

    • Jake Grigsby (University of Virginia); Yanjun Qi (University of Virginia)

  • [pdf] [video] Provably Efficient Policy Optimization via Thompson Sampling

    • Haque Ishfaq (Mila, McGill University); Zhuoran Yang (Princeton.edu); Andrei Lupu (Mila, McGill University); Viet Nguyen (Mila, McGill University); Lewis Liu (University of Montreal, Mila); Riashat Islam (MILA, Mcgill University); Zhaoran Wang (Northwestern); Doina Precup (McGill University)

  • [pdf] [video] Outcome-Driven Reinforcement Learning via Variational Inference

    • Tim G. J. Rudner (University of Oxford); Vitchyr Pong (UC Berkeley); Rowan McAllister (UC Berkeley); Yarin Gal (University of Oxford); Sergey Levine (UC Berkeley)

  • [pdf] [video] Policy Learning Using Weak Supervision

    • Jingkang Wang (Uber ATG, University of Toronto); Hongyi Guo (Shanghai Jiao Tong University); Zhaowei Zhu (UC Santa Cruz); Yang Liu (UC Santa Cruz)

  • [pdf] [supplementary material] [video] Motion Planner Augmented Reinforcement Learning for Robot Manipulation in Obstructed Environments

    • Jun Yamada (University of Southern California); Youngwoon Lee (University of Southern California); Gautam Salhotra (University of Southern California); Karl Pertsch (University of Southern California); Max Pflueger (University of Southern California); Gaurav Sukhatme (University of Southern California); Joseph Lim (USC); Peter Englert (University of Southern California)

  • [pdf] [video] Discovery of Options via Meta-Gradients

    • Vivek Veeriah (University of Michigan); Tom Zahavy (DeepMind); Matteo Hessel (DeepMind); Zhongwen Xu (DeepMind); Junhyuk Oh (DeepMind); Iurii Kemaev (Deepmind); Hado van Hasselt (DeepMind); David Silver (DeepMind); Satinder Singh (DeepMind)

  • [pdf] [supplementary material] [video] SCC: an efficient deep reinforcement learning agent mastering the game of StarCraft II

    • Xiangjun Wang (inspir.ai); Junxiao SONG (inspir.ai)

  • [pdf] [supplementary material] [video] Unsupervised Domain Adaptation for Visual Navigation

    • Shangda Li (Carnegie Mellon University); Devendra Singh Chaplot (Carnegie Mellon University); Yao-Hung Tsai (Carnegie Mellon University); Yue Wu (Carnegie Mellon University); Louis-Philippe Morency (Carnegie Mellon University); Ruslan Salakhutdinov (Carnegie Mellon University)

  • [pdf] [video] Continual Model-Based Reinforcement Learning with Hypernetworks

    • Yizhou Huang (University of Toronto); Kevin Xie (University of Toronto); Homanga Bharadhwaj (University of Toronto, Vector Institute); Florian Shkurti (University of Toronto)

  • [pdf] [supplementary material] [video] GRAC: Self-Guided and Self-Regularized Actor-Critic

    • Lin Shao (Stanford University); Yifan You (UCLA); Mengyuan Yan (Stanford University); Qingyun Sun (Stanford university); Jeannette Bohg (Stanford)

  • [pdf] [video] Harnessing Distribution Ratio Estimators for Learning Agents with Quality and Diversity

    • Tanmay Gangwani (UIUC); Jian Peng (UIUC); Yuan Zhou (UIUC)

  • [pdf] [video] R-LAtte: Visual Control via Deep Reinforcement Learning with Attention Network

    • Mandi Zhao (UC Berkeley); Qiyang Li (University of California, Berkeley); Aravind Srinivas (); Ignasi Clavera (UC Berkeley); Kimin Lee (UC Berkeley); Pieter Abbeel (UC Berkeley)

  • [pdf] [video] Domain Adversarial Reinforcement Learning

    • Bonnie Li (McGill); Vincent Francois-Lavet (McGill); Thang Doan (Mila / McGill); Joelle Pineau (McGill / Facebook)

  • [pdf] [supplementary material] [video] Latent State Models for Meta-Reinforcement Learning from Images

    • Anusha Nagabandi (UC Berkeley); Zihao Zhao (UC Berkeley); Kate Rakelly (UC Berkeley); Chelsea Finn (Stanford); Sergey Levine (UC Berkeley)

  • [pdf] [video] Learning Markov State Abstractions for Deep Reinforcement Learning

    • Cameron Allen (Brown University); Neev Parikh (Brown University); George Konidaris (Brown)

  • [pdf] [video] Backtesting Optimal Trade Execution Policies in Agent-Based Market Simulator

    • Siyu Lin (University of Virginia); Peter Beling (University of Virginia)

  • [pdf] [supplementary material] [video] Deep Bayesian Quadrature Policy Optimization

    • Ravi Tej Akella (Indian Institute of Technology Roorkee); Kamyar Azizzadenesheli (Purdue University); Mohammad Ghavamzadeh (Google Research); Animashree Anandkumar (Caltech); Yisong Yue (Caltech)

  • [pdf] [video] Predictive PER: Balancing Priority and Diversity towards Stable Deep Reinforcement Learning

    • Sanghwa Lee (National Institute of Informatics); Jaeyoung Lee (University of Waterloo); Ichiro Hasuo (National Institute of Informatics & SOKENDAI)

  • [pdf] [supplementary material] [video] Value Generalization among Policies: Improving Value Function with Policy Representation

    • Hongyao Tang (Tianjin University); Zhaopeng Meng (School of Computer Software, Tianjin University); Jianye Hao (Tianjin University); Chen Chen (Huawei Noah’s Ark Lab); Daniel Graves (Huawei); Dong Li (Huawei Noah's Ark Lab); Wulong Liu (Huawei Noah's Ark Lab); Yaodong Yang (Huawei Noah's Ark Lab)

  • [pdf] [supplementary material][video] Successor Landmarks for Efficient Exploration and Long-Horizon Navigation

    • Christopher Hoang (University of Michigan); Sungryull Sohn (University of Michigan); Jongwook Choi (University of Michigan); Wilka Carvalho (University of Michigan); Honglak Lee (University of Michingan / Google Research)

  • [pdf] [video] Policy Guided Planning in Learned Latent Space

    • Mohammad Amini (Mila, McGill University); Doina Precup (McGill University); Sarath Chandar (Mila)

Program Committee

We would like to thank the following people for their effort in making this year's edition of the Deep RL Workshop a success!

  • David Abel

  • Pulkit Agrawal

  • Maruan Al Shedivat

  • Marcin Andrychowicz

  • Dilip Arumgam

  • Glen Berseth

  • Diana Borsa

  • Ethan Brooks

  • Noam Brown

  • Roberto Calandra

  • Wilka Carvalho

  • Devendra Singh Chaplot

  • Veronica Chelu

  • Richard Chen

  • Jongwook Choi

  • Ignasi Clavera

  • Thomas Degris

  • Harri Edwards

  • Jess Farebrother

  • Jakob Foerster

  • Justin Fu

  • Yasuhiro Fujita

  • Shixiang Gu

  • Arthur Guez

  • Xiaoxiao Guo

  • Yijie Guo

  • Abhishek Gupta

  • David Ha

  • Tuomas Haarnoja

  • Danijar Hafner

  • Jessica Hamrick

  • Anna Harutyunyan

  • Karol Hausman

  • Rein Houthooft

  • Sandy Huang

  • Maximilian Igl

  • Riashat Islam

  • Max Jaderberg

  • Gregory Kahn

  • Khimya Khetarpal

  • Louis Kirsch

  • Ilya Kostrikov

  • Andrew Lampinen

  • Alex Lee

  • Lisa Lee

  • Ryan Lowe

  • Fangchen Liu

  • Qiyang Li

  • Rowan McAllister

  • Nikhil Mishra

  • Vlad Mnih

  • Aditya Modi

  • Igor Mordatch

  • Ofir Nachum

  • Anusha Nagabandi

  • Ashvin Nair

  • Suraj Nair

  • Karthik Narasimhan

  • Junhyuk Oh

  • Deepak Pathak

  • Xue Bin Peng

  • Lerrel Pinto

  • Vitchyr Pong

  • Aravind Rajeswaran

  • Sid Reddy

  • Oleh Rybkin

  • Tim Salimans

  • Tom Schaul

  • Pierre Sermanet

  • Rohin Shah

  • Archit Sharma

  • Max Smith

  • Sungryull Sohn

  • Aravind Srinivas

  • Bradly Stadie

  • Arthur Szlam

  • Aviv Tamar

  • Chen Tessler

  • Yuandong Tian

  • Sasha Vezhnevets

  • Risto Vuorio

  • Tony Wu

  • Yi Wu

  • Markus Wulfmeier

  • Ted Xiao

  • Zhongwen Xu

  • Huazhe Xu

  • Ge Yang

  • Dennis Yarats

  • Tianhe Yu

  • Tom Zahavy

  • Marvin Zhang

  • Shangtong Zhang

  • Qi Zhang

  • Amy Zhang

  • Zeyu Zheng

  • Allan Zhou

  • Luisa Zintgraf