Reinforcement Learning for Real Life

ICML 2019 Workshop

June 14, 2019, Long Beach, CA, USA

Reinforcement learning (RL) is a general learning, predicting, and decision making paradigm. RL provides solution methods for sequential decision making problems as well as those can be transformed into sequential ones. RL connects deeply with optimization, statistics, game theory, causal inference, sequential experimentation, etc., overlaps largely with approximate dynamic programming and optimal control, and applies broadly in science, engineering and arts.

RL has been making steady progress in academia recently, e.g., Atari games, AlphaGo, visuomotor policies for robots. RL has also been applied to real world scenarios like recommender systems and neural architecture search. See a recent collection about RL applications. It is desirable to have RL systems that work in the real world with real benefits. However, there are many issues for RL though, e.g. generalization, sample efficiency, and exploration vs. exploitation dilemma. Consequently, RL is far from being widely deployed. Common, critical and pressing questions for the RL community are then: Will RL have wide deployments? What are the issues? How to solve them?

The goal of this workshop is to bring together researchers and practitioners from industry and academia interested in addressing practical and/or theoretical issues in applying RL to real life scenarios, review state of the arts, clarify impactful research problems, brainstorm open challenges, share first-hand lessons and experiences from real life deployments, summarize what has worked and what has not, collect tips for people from industry looking to apply RL and RL experts interested in applying their methods to real domains, identify potential opportunities, generate new ideas for future lines of research and development, and promote awareness and collaboration. This is not "yet another RL workshop": it is about how to successfully apply RL to real life applications. This is a less addressed issue in the RL/ML/AI community, and calls for immediate attention for sustainable prosperity of RL research and development.


8:30-8:50 optional early-bird posters

8:50-9:00 opening remarks (Yuxi Li)

9:00-10:00 invited talks (chaired by Tao Wang)

  • David Silver (Deepmind) (video)
  • John Langford (Microsoft Research) (video)
  • Craig Boutilier (Google Research) (video )

10:00-11:00 posters (coffee break 10:30-11:00)

11:00-12:00 panel discussion (moderated by Alborz Geramifard)

  • Craig Boutilier (Google Research)
  • Emma Brunskill (Stanford)
  • Chelsea Finn (Google Brain, Stanford, UC Berkeley)
  • Mohammad Ghavamzadeh (Facebook AI)
  • John Langford (Microsoft Research)
  • David Silver (Deepmind)
  • Peter Stone (UT Austin, Cogitai)

12:00-12:05 closing remarks (Lihong Li)

12:05-12:30 optional posters

Invited Talks

Craig Boutilier

Title: Reinforcement Learning in Recommender Systems: Some Challenges (video )

Abstract: I'll present a brief overview of some recent work on reinforcement learning motivated by practical issues that arise in the application of RL to online, user-facing applications like recommender systems. These include stochastic action sets, long-term cumulative effects, and combinatorial action spaces. I'll provide some detail on the last of these, describing SlateQ, a novel decomposition technique that allows value-based RL (e.g., Q-learning) in slate-based recommender to scale to commercial production systems, and briefly describe both small-scale simulation and a large-scale experiment with YouTube. Joint work with various collaborators.

Bio: Craig is Principal Scientist at Google, working on various aspects of decision making under uncertainty (e.g., reinforcement learning, Markov decision processes, user modeling, preference modeling and elicitation) and recommender systems. He received his Ph.D. from the University of Toronto in 1992, and has held positions at the University of British Columbia, University of Toronto, CombineNet, and co-founded Granata Decision Systems.

Craig was Editor-in-Chief of JAIR; Associate Editor with ACM TEAC, JAIR, JMLR, and JAAMAS; Program Chair for IJCAI-09 and UAI-2000. Boutilier is a Fellow of the Royal Society of Canada (RSC), the Association for Computing Machinery (ACM) and the Association for the Advancement of Artificial Intelligence (AAAI). He was recipient of the 2018 ACM/SIGAI Autonomous Agents Research Award and a Tier I Canada Research Chair; and has received (with great co-authors) a number of Best Paper awards including: the 2009 IJCAI-JAIR Best Paper Prize; the 2014 AIJ Prominent Paper Award; and the 2018 NeurIPS Best Paper Award.

John Langford

Title: How do we make Real World Reinforcement Learning revolution? (video)

Abstract: Doing Real World Reinforcement Learning implies living with steep constraints on the sample complexity of solutions. Where is this viable? Where might it be viable in the near future? In the far future? How can we design a research program around identifying and building such solutions? In short, what are the missing elements we need to really make reinforcement learning more mundane and commonly applied than Supervised Learning? The potential is certainly there given the naturalness of RL compared to supervised learning, but the present is manifestly different.


David Silver

Title: AlphaStar: Mastering the Game of StarCraft II (video)

Abstract: In recent years, the real-time strategy game of StarCraft has emerged by consensus as an important challenge for AI research. It combines several major difficulties that are intractable for many existing algorithms: a large, structured action space; imperfect information about the opponent; a partially observed map; and cycles in the strategy space. Each of these challenges represents a major difficulty faced by real-world applications, for example those based on internet-scale action spaces, game theory in e.g. security, point-and-click interfaces, or robust AI in the presence of diverse and potentially exploitative user strategies. Here, we introduce AlphaStar: a novel combination of deep learning and reinforcement learning that mastered this challenging domain and defeated human professional players for the first time.



We have papers/posters from submissions and also by invitation. We organize them into the following categories: best papers, position papers, benchmark/toolbox papers, applications papers about production systems, autonomous driving, business management, chemistry, computer systems, healthcare, and robotics/manufacture, and algorithm/theory papers, in various topics and in bandits, off-policy learning, and safety. Posters are available for some papers. Submitted papers are available here.

Best Papers

Lyapunov-based Safe Policy Optimization for Continuous Control

Yinlam Chow, Ofir Nachum, Aleksandra Faust, Mohammad Ghavamzadeh, Edgar Duenez-Guzman

Challenges of Real-World Reinforcement Learning (poster)

Gabriel Dulac-Arnold, Daniel Mankowitz, Todd Hester

Horizon: Facebook’s Open Source Applied Reinforcement Learning Platform

Jason Gauci, Edoardo Conti, Yitao Liang, Kittipat Virochsiri, Yuchen He, Zachary Kaden, Vivek Narayanan, Xiaohui Ye, Zhengxing Chen

Park: An Open Platform for Learning Augmented Computer Systems (poster)

Hongzi Mao, Akshay Narayan, Parimarjan Negi, Hanrui Wang, Jiacheng Yang, Haonan Wang, Mehrdad Khani, Songtao He, Ravichandra Addanki, Ryan Marcus, Frank Cangialosi, Wei-Hung Weng, Song Han, Tim Kraska, Mohammad Alizadeh

Position Papers

Challenges of Real-World Reinforcement Learning

Gabriel Dulac-Arnold, Daniel Mankowitz, Todd Hester

Lessons from Contextual Bandit Learning in a Customer Support Bot

Nikos Karampatziakis, Sebastian Kochman, Jade Huang, Paul Mineiro, Kathy Osborne, Weizhu Chen

Benchmarks/Toolbox Papers

VRKitchen: an Interactive 3D Environment for Learning Real Life Cooking Tasks (poster)

Xiaofeng Gao, Ran Gong, Tianmin Shu, Xu Xie, Shu Wang, Song-Chun Zhu

Park: An Open Platform for Learning Augmented Computer Systems (poster)

Hongzi Mao, Akshay Narayan, Parimarjan Negi, Hanrui Wang, Jiacheng Yang, Haonan Wang, Mehrdad Khani, Songtao He, Ravichandra Addanki, Ryan Marcus, Frank Cangialosi, Wei-Hung Weng, Song Han, Tim Kraska, Mohammad Alizadeh

Reinforcement Learning for Sepsis Treatment: Baselines and Analysis

Aniruddh Raghu

Applications Papers

Applications: Production Systems

Top-K Off-Policy Correction for a REINFORCE Recommender System (invited poster, WSDM 2019 video)

Minmin Chen, Alex Beutel, Paul Covington, Sagar Jain, Francois Belletti, Ed Chi

Horizon: Facebook’s Open Source Applied Reinforcement Learning Platform

Jason Gauci, Edoardo Conti, Yitao Liang, Kittipat Virochsiri, Yuchen He, Zachary Kaden, Vivek Narayanan, Xiaohui Ye, Zhengxing Chen

Data center cooling using model-predictive control (invited poster, NeurIPS 2018)

Nevena Lazic, Tyler Lu, Craig Boutilier, Moonkyung Ryu, Eehern Wong, Binz Roy, Greg Imwalle

Real-world Video Adaptation with Reinforcement Learning (poster)

Hongzi Mao, Shannon Chen, Drew Dimmery, Shaun Singh, Drew Blaisdell, Yuandong Tian, Mohammad Alizadeh, Eytan Bakshy

Environment Reconstruction with Hidden Confounders for Reinforcement Learning based Recommendation (invited poster, KDD 2019) (poster)

Wenjie Shang, Yang Yu, Qingyang Li, Zhiwei Qin, Yiping Meng and Jieping Ye

A Deep Value-network Based Approach for Multi-Driver Order Dispatching (invited poster, KDD 2019) (poster)

Xiaocheng Tang, Zhiwei Qin, Fan Zhang, Zhaodong Wang, Zhe Xu, Yintai Ma, Hongtu Zhu and Jieping Ye

Applications: Autonomous Driving

Real-World Autonomous Vehicle Control Trained Entirely within Data-Driven Simulation

Alexander Amini, Igor Gilitschenski, Jacob Phillips, Julia Moseyko, Sertac Karaman, Daniela Rus

Applications: Business Management

Autonomous Air Traffic Controller: A Deep Multi-Agent Reinforcement Learning Approach

Marc Brittain, Peng Wei

Generative Adversarial User Model for Reinforcement Learning Based Recommendation System (invited poster, ICML 2019) (poster)

Xinshi Chen, Shuang Li, Hui Li, Shaohua Jiang, Yuan Qi, Le Song

RecSim — A Configurable Recommender Systems Environment (invited poster) (poster)

Eugene Ie, Chih-wei Hsu, Martin Mladenov, Sanmit Narvekar, Jing Conan Wang, Rui Wu, Vihan Jain, Craig Boutilier

RetailNet: Enhancing Retails of Perishable Products with Multiple Selling Strategies via Pair-Wise Multi-Q Learning

Xiyao Ma, Fan Lu, Xiajun Pan, Yanlin Zhou, Xiaolin Li

Autonomous Airline Revenue Management: A Deep Reinforcement Learning Approach to Seat Inventory Control and Overbooking

Syed Arbab Mohd Shihab, Caleb Logemann, Deepak-George Thomas, Peng Wei

A Reinforcement Learning Approach for Joint Replenishment Policy in Multi-Product Inventory System

Hiroshi Suetsugu, Yoshiaki Narusue, Hiroyuki Morikawa

Reinforcement learning in maintenance of civil infrastructures

Shiyin Wei, Hui Li

Applications: Chemistry

Optimizing 3D structure of H2O molecule using DDPG (poster)

Soo Kyung Kim, Peggy Li, Joanne Taery Kim, Piyush Karande, Yong Han

Chemical Synthesis Planning via Reinforcement Learning and its Implications for Drug Discovery (invited poster)

Marwin Segler

Applications: Computer Systems

Reinforcement Learning and Adaptive Sampling for Optimized DNN Compilation

Byung Hoon Ahn, Prannoy Pilligundla, Hadi Esmaeilzadeh

SmartChoices: Hybridizing Programming and Machine Learning

Victor Carbune, Thierry Coppey, Alexander Daryin, Thomas Deselaers, Nikhil Sarda, Jay Yagnik

A Deep Reinforcement Learning Perspective on Internet Congestion Control (ICML 2019)

Nathan Jay, Noga H. Rotman, Brighten Godfrey, Michael Schapira, Aviv Tamar

Staying up to Date with Online Content Changes Using Reinforcement Learning for Scheduling

Andrey Kolobov, Yuval Peres, Cheng Lu, Eric Horvitz

Meta-reasoning in Modular Software Systems via Reinforcement Learning (invited poster) (poster)

Aditya Modi, Debadeepta Dey, Alekh Agarwal, Adith Swaminathan, Besmira Nushi, Sean Andrist, Eric Horvitz

Applications: Energy

Deep Reinforcement Learning for Continuous Power Allocation in Flexible High Throughput Satellites (poster)

Juan Jose Garau Luis, Markus Guerster, Edward Crawley, Bruce Cameron

Applications: Healthcare

Dynamic Measurement Scheduling for Event Forecasting using Deep RL (invited poster, ICML 2019)

Chun-Hao Chang, Mingjie Mai, Anna Goldenberg

Optimizing Sequential Medical Treatments with Auto-Encoding Heuristic Search in POMDPs (invited poster)

Luchen Li, Matthieu Komorowski and Aldo A. Faisal

Personalized HeartSteps: A Reinforcement Learning Algorithm for Optimizing Physical Activity (invited poster)

Peng Liao, Kristjan Greenwald, Predrag Klasnja and Susan Murphy

Crowdsourcing Reinforcement Learning to Optimize Knee Replacement Pathway

Hao Lu, Mengdi Wang

Reinforcement Learning for Blood Glucose Control: Challenges and Opportunities (poster)

Ian Fox, Jenna Wiens

Intelligent Pooling in Thompson Sampling for Rapid Personalization in Mobile Health

Sabina Tomkins, Peng Liao, Serena Yeung, Predrag Klasnja, Susan Murphy

Applications: Robotics/Manufacture

Deep Reinforcement Learning for Industrial Insertion Tasks with Visual Inputs and Natural Reward Signals

Gerrit Schoettler, Ashvin Nair, Jianlan Luo, Shikhar Bahl, Juan Aparicio Ojea, Eugen Solowjow, Sergey Levine

Algorithm & Theory

Improving the Generalization of Visual Navigation Policies using Invariance Regularization (poster)

Michel Aractingi, Christopher Dance, Julien Perez, Tomi Silander

Curious iLQR: Resolving Uncertainty in Model-based RL

Sarah Bechtle, Akshara Rai, Yixin Lin, Ludovic Righetti, Franziska Meier

Deep Knowledge Based Agent: Learning to do tasks by self-thinking about imaginary worlds

Ali Davody

P3O: Policy-on Policy-off Policy Optimization

Rasool Fakoor, Pratik Chaudhari, Alexander J. Smola

Contextual Markov Decision Processes using Generalized Linear Models (poster)

Aditya Modi, Ambuj Tewari

Addressing Sample Complexity in Visual Tasks Using Hindsight Experience Replay and Hallucinatory GANs

Himanshu Sahni, Toby Buckley, Pieter Abbeel, Ilya Kuzovkin

Fast Efficient Hyperparameter Tuning for Policy Gradients (poster)

Supratik Paul, Vitaly Kurin, Shimon Whiteson

Q-Learning for Continuous Actions with Cross-Entropy Guided Policies (poster)

Riley Simmons-Edler, Ben Eisner, Eric Mitchell, Sebastian Seung, Daniel Lee

R-MADDPG for Partially Observable Environments and Limited Communication

Rose E. Wang, Michael Everett, Jonathan P. How

Algorithm & Theory : Bandits

A Contextual Bandit Bake-off (invited poster)

Alberto Bietti, Alekh Agarwal, John Langford

Optimal Exploitation of Clustering and History Information in Multi-armed Bandit Problem

Djallel Bouneffouf, Srinivasan Parthasarathy, Horst Samulowitz, Martin Wistuba

Distribution-dependent and Time-uniform Bounds for Piecewise i.i.d Bandits (poster)

Subhojyoti Mukherjee, Odalric Maillard

Multinomial Logit Contextual Bandits

Min-hwan Oh, Garud Iyengar

Algorithm & Theory : Off-Policy Learning

Off-Policy Evaluation via Off-Policy Classification (poster)

Alex Irpan, Kanishka Rao, Konstantinos Bousmalis, Chris Harris, Julian Ibarz, Sergey Levine

Off-Policy Policy Gradient with State Distribution Correction (invited poster)

Yao Liu, Adith Swaminathan, Alekh Agarwal, Emma Brunskill

DualDICE: Efficient Estimation of Off-Policy Stationary Distribution Corrections (poster)

Ofir Nachum, Yinlam Chow, Bo Dai, Lihong Li

Algorithm & Theory : Safety

Lyapunov-based Safe Policy Optimization for Continuous Control

Yinlam Chow, Ofir Nachum, Aleksandra Faust, Mohammad Ghavamzadeh, Edgar Duenez-Guzman

Distributionally Robust Reinforcement Learning (safety)

Elena Smirnova, Elvis Dohmatob, Jérémie Mary

Call For Paper

The main goals of the workshop are to: (1) have experts share their successful stories of applying RL to real-world problems; and (2) identify research sub-areas critical for real-world applications such as reliable evaluation, benchmarking, and safety/robustness.

We invite paper submissions successfully applying RL and relevant algorithms to real life RL applications by addressing relevant RL issues. Under the central theme of making RL work in real life scenarios, no further constraints are set, to facilitate open discussions and to foster the most potential creativity and imagination from the community. We will prioritize work that propose interesting and impactful contributions. Our technical topics of interest are general, including but not limited to concrete topics below:

  • RL and relevant algorithms: value-based, policy-based, model-free, model-based, online, offline, on-policy, off-policy, hierarchical, multi-agent, relational, multi-armed bandit, (linear, nonlinear, deep/neural, symbolic) representation learning, unsupervised learning, self-supervised learning, transfer learning, sim-to-real, multi-task learning, meta-learning, imitation learning, continual learning, causal inference, and reasoning;
  • Issues: generalization, deadly triad, sample/time/space efficiency, exploration vs. exploitation, reward specification, stability, convergence, scalability, model-based learning (model validation and model error estimation), prior knowledge, safety, interpretability, reproducibility, hyper-parameters tuning, and boilerplate code;
  • Applications: recommender systems, advertisements, conversational AI, business, finance, healthcare, education, robotics, autonomous driving, transportation, energy, chemical synthesis, drug design, industry control, drawing, music, and other problems in science, engineering and arts.

We warmly welcome position papers.

We invite unpublished submissions up to 8 pages excluding references, in PDF format using the ICML 2019 template and style guidelines. We are open to papers currently under review at other venues. Submission is single-blind. All accepted papers will be presented as posters, and a few of them will be selected for spotlight presentations. There will be no proceedings for this workshop. However, accepted contributions will be made available on the workshop website, unless authors opt out. The submission website is:

Important dates:

  • Submission deadline: May 5, 2019 (23:59 EST)
  • Author notification: May 28, 2019
  • Final submission: June 3, 2019

Info for Posters:

All posters will be presented inside the room of the workshop. There are no poster boards at workshops. Posters are taped to the wall. Posters should be on light weight paper, not laminated. Please make posters 36W x 48H inches or 90 x 122 cm. Please follow the specification on ICML website. ("Please ask your presenters to make their posters posters 24W x 36H inches or 61 x 91 cm.")

Final version:

Style files for our workshop (customized from that of ICML 2019).

Instruction: Based on ICML 2019 submission style files, 1) change \usepackage{icml2019} to \usepackage[accepted]{icml2019} in your .tex file for the final style; and 2) use our customized icml2019.sty file for the foot note of our workshop.

Program Committee Members

David Abel (Brown University)

Omid Ardakanian (University of Alberta)

Kamyar Azizzadenesheli (Purdue University)

Justin Basilico (Netflix)

Victor Carbune (Google Research)

Minmin Chen (Google Research)

Yinlam Chow (Google Research)

Bo Dai (Google Research)

Christoph Dann (Carnegie Mellon University)

Gabriel Dulac-Arnold (Google Research)

Ben Eisner (Samsung Research)

Rasool Fakoor (Amazon)

Xiaofeng Gao (University of California, Los Angeles)

Todd Hester (DeepMind)

Nikos Karampatziakis (Microsoft)

Soo Kyung Kim (Lawrence Livermore National Labs)

Andrey Kolobov (Microsoft)

Branislav Kveton (Google Research)

Minhae Kwon (Rice University)

Peng Liao (Harvard University)

Xin Liu (University of California, Davis)

Yao Liu (Stanford University)

Hongzi Mao (Massachusetts Institute of Technology)

Ofir Nachum (Google Research)

Zhiwei Tony Qin (Didi Chuxing)

Marwin Segler (BenevolentAI)

Jun Wang (University College London)

Zhipeng Wang (Apple)

Peng Wei (Iowa State University)

Hengshuai Yao (Huawei Technologies)

Kai Yu (Shanghai Jiao Tong University)

Yang Yu (Nanjing University)

Quan Yuan (Inspir.AI)

Shangtong Zhang (University of Oxford)

Xiangyu Zhao (Michigan State University)



Email: Slack LinkedIn Group Twitter: #RL4RealLife

Highlighted by VentureBeat.