Reinforcement Learning for Real Life
ICML 2019 Workshop
June 14, 2019, Long Beach, CA, USA
Reinforcement learning (RL) is a general learning, predicting, and decision making paradigm. RL provides solution methods for sequential decision making problems as well as those can be transformed into sequential ones. RL connects deeply with optimization, statistics, game theory, causal inference, sequential experimentation, etc., overlaps largely with approximate dynamic programming and optimal control, and applies broadly in science, engineering and arts.
RL has been making steady progress in academia recently, e.g., Atari games, AlphaGo, visuomotor policies for robots. RL has also been applied to real world scenarios like recommender systems and neural architecture search. See a recent collection about RL applications. It is desirable to have RL systems that work in the real world with real benefits. However, there are many issues for RL though, e.g. generalization, sample efficiency, and exploration vs. exploitation dilemma. Consequently, RL is far from being widely deployed. Common, critical and pressing questions for the RL community are then: Will RL have wide deployments? What are the issues? How to solve them?
The goal of this workshop is to bring together researchers and practitioners from industry and academia interested in addressing practical and/or theoretical issues in applying RL to real life scenarios, review state of the arts, clarify impactful research problems, brainstorm open challenges, share first-hand lessons and experiences from real life deployments, summarize what has worked and what has not, collect tips for people from industry looking to apply RL and RL experts interested in applying their methods to real domains, identify potential opportunities, generate new ideas for future lines of research and development, and promote awareness and collaboration. This is not "yet another RL workshop": it is about how to successfully apply RL to real life applications. This is a less addressed issue in the RL/ML/AI community, and calls for immediate attention for sustainable prosperity of RL research and development.
Schedule
8:30-8:50 optional early-bird posters
8:50-9:00 opening remarks (Yuxi Li)
9:00-10:00 invited talks (chaired by Tao Wang)
- David Silver (Deepmind) (video)
- John Langford (Microsoft Research) (video)
- Craig Boutilier (Google Research) (video )
10:00-11:00 posters (coffee break 10:30-11:00)
11:00-12:00 panel discussion (moderated by Alborz Geramifard)
- Craig Boutilier (Google Research)
- Emma Brunskill (Stanford)
- Chelsea Finn (Google Brain, Stanford, UC Berkeley)
- Mohammad Ghavamzadeh (Facebook AI)
- John Langford (Microsoft Research)
- David Silver (Deepmind)
- Peter Stone (UT Austin, Cogitai)
12:00-12:05 closing remarks (Lihong Li)
12:05-12:30 optional posters
Invited Talks
Craig Boutilier
Title: Reinforcement Learning in Recommender Systems: Some Challenges (video )
Abstract: I'll present a brief overview of some recent work on reinforcement learning motivated by practical issues that arise in the application of RL to online, user-facing applications like recommender systems. These include stochastic action sets, long-term cumulative effects, and combinatorial action spaces. I'll provide some detail on the last of these, describing SlateQ, a novel decomposition technique that allows value-based RL (e.g., Q-learning) in slate-based recommender to scale to commercial production systems, and briefly describe both small-scale simulation and a large-scale experiment with YouTube. Joint work with various collaborators.
Bio: Craig is Principal Scientist at Google, working on various aspects of decision making under uncertainty (e.g., reinforcement learning, Markov decision processes, user modeling, preference modeling and elicitation) and recommender systems. He received his Ph.D. from the University of Toronto in 1992, and has held positions at the University of British Columbia, University of Toronto, CombineNet, and co-founded Granata Decision Systems.
Craig was Editor-in-Chief of JAIR; Associate Editor with ACM TEAC, JAIR, JMLR, and JAAMAS; Program Chair for IJCAI-09 and UAI-2000. Boutilier is a Fellow of the Royal Society of Canada (RSC), the Association for Computing Machinery (ACM) and the Association for the Advancement of Artificial Intelligence (AAAI). He was recipient of the 2018 ACM/SIGAI Autonomous Agents Research Award and a Tier I Canada Research Chair; and has received (with great co-authors) a number of Best Paper awards including: the 2009 IJCAI-JAIR Best Paper Prize; the 2014 AIJ Prominent Paper Award; and the 2018 NeurIPS Best Paper Award.
John Langford
Title: How do we make Real World Reinforcement Learning revolution? (video)
Abstract: Doing Real World Reinforcement Learning implies living with steep constraints on the sample complexity of solutions. Where is this viable? Where might it be viable in the near future? In the far future? How can we design a research program around identifying and building such solutions? In short, what are the missing elements we need to really make reinforcement learning more mundane and commonly applied than Supervised Learning? The potential is certainly there given the naturalness of RL compared to supervised learning, but the present is manifestly different.
Bio: https://en.wikipedia.org/wiki/John_Langford_(computer_scientist)
David Silver
Title: AlphaStar: Mastering the Game of StarCraft II (video)
Abstract: In recent years, the real-time strategy game of StarCraft has emerged by consensus as an important challenge for AI research. It combines several major difficulties that are intractable for many existing algorithms: a large, structured action space; imperfect information about the opponent; a partially observed map; and cycles in the strategy space. Each of these challenges represents a major difficulty faced by real-world applications, for example those based on internet-scale action spaces, game theory in e.g. security, point-and-click interfaces, or robust AI in the presence of diverse and potentially exploitative user strategies. Here, we introduce AlphaStar: a novel combination of deep learning and reinforcement learning that mastered this challenging domain and defeated human professional players for the first time.
Bio: https://en.wikipedia.org/wiki/David_Silver_(programmer)
Papers/Posters
We have papers/posters from submissions and also by invitation. We organize them into the following categories: best papers, position papers, benchmark/toolbox papers, applications papers about production systems, autonomous driving, business management, chemistry, computer systems, healthcare, and robotics/manufacture, and algorithm/theory papers, in various topics and in bandits, off-policy learning, and safety. Posters are available for some papers. Submitted papers are available here.
Best Papers
Lyapunov-based Safe Policy Optimization for Continuous Control
Yinlam Chow, Ofir Nachum, Aleksandra Faust, Mohammad Ghavamzadeh, Edgar Duenez-Guzman
Challenges of Real-World Reinforcement Learning (poster)
Gabriel Dulac-Arnold, Daniel Mankowitz, Todd Hester
Horizon: Facebook’s Open Source Applied Reinforcement Learning Platform
Jason Gauci, Edoardo Conti, Yitao Liang, Kittipat Virochsiri, Yuchen He, Zachary Kaden, Vivek Narayanan, Xiaohui Ye, Zhengxing Chen
Park: An Open Platform for Learning Augmented Computer Systems (poster)
Hongzi Mao, Akshay Narayan, Parimarjan Negi, Hanrui Wang, Jiacheng Yang, Haonan Wang, Mehrdad Khani, Songtao He, Ravichandra Addanki, Ryan Marcus, Frank Cangialosi, Wei-Hung Weng, Song Han, Tim Kraska, Mohammad Alizadeh
Position Papers
Challenges of Real-World Reinforcement Learning
Gabriel Dulac-Arnold, Daniel Mankowitz, Todd Hester
Lessons from Contextual Bandit Learning in a Customer Support Bot
Nikos Karampatziakis, Sebastian Kochman, Jade Huang, Paul Mineiro, Kathy Osborne, Weizhu Chen
Benchmarks/Toolbox Papers
VRKitchen: an Interactive 3D Environment for Learning Real Life Cooking Tasks (poster)
Xiaofeng Gao, Ran Gong, Tianmin Shu, Xu Xie, Shu Wang, Song-Chun Zhu
Park: An Open Platform for Learning Augmented Computer Systems (poster)
Hongzi Mao, Akshay Narayan, Parimarjan Negi, Hanrui Wang, Jiacheng Yang, Haonan Wang, Mehrdad Khani, Songtao He, Ravichandra Addanki, Ryan Marcus, Frank Cangialosi, Wei-Hung Weng, Song Han, Tim Kraska, Mohammad Alizadeh
Reinforcement Learning for Sepsis Treatment: Baselines and Analysis
Aniruddh Raghu
Applications Papers
Applications: Production Systems
Top-K Off-Policy Correction for a REINFORCE Recommender System (invited poster, WSDM 2019 video)
Minmin Chen, Alex Beutel, Paul Covington, Sagar Jain, Francois Belletti, Ed Chi
Horizon: Facebook’s Open Source Applied Reinforcement Learning Platform
Jason Gauci, Edoardo Conti, Yitao Liang, Kittipat Virochsiri, Yuchen He, Zachary Kaden, Vivek Narayanan, Xiaohui Ye, Zhengxing Chen
Data center cooling using model-predictive control (invited poster, NeurIPS 2018)
Nevena Lazic, Tyler Lu, Craig Boutilier, Moonkyung Ryu, Eehern Wong, Binz Roy, Greg Imwalle
Real-world Video Adaptation with Reinforcement Learning (poster)
Hongzi Mao, Shannon Chen, Drew Dimmery, Shaun Singh, Drew Blaisdell, Yuandong Tian, Mohammad Alizadeh, Eytan Bakshy
Environment Reconstruction with Hidden Confounders for Reinforcement Learning based Recommendation (invited poster, KDD 2019) (poster)
Wenjie Shang, Yang Yu, Qingyang Li, Zhiwei Qin, Yiping Meng and Jieping Ye
A Deep Value-network Based Approach for Multi-Driver Order Dispatching (invited poster, KDD 2019) (poster)
Xiaocheng Tang, Zhiwei Qin, Fan Zhang, Zhaodong Wang, Zhe Xu, Yintai Ma, Hongtu Zhu and Jieping Ye
Applications: Autonomous Driving
Real-World Autonomous Vehicle Control Trained Entirely within Data-Driven Simulation
Alexander Amini, Igor Gilitschenski, Jacob Phillips, Julia Moseyko, Sertac Karaman, Daniela Rus
Applications: Business Management
Autonomous Air Traffic Controller: A Deep Multi-Agent Reinforcement Learning Approach
Marc Brittain, Peng Wei
Generative Adversarial User Model for Reinforcement Learning Based Recommendation System (invited poster, ICML 2019) (poster)
Xinshi Chen, Shuang Li, Hui Li, Shaohua Jiang, Yuan Qi, Le Song
RecSim — A Configurable Recommender Systems Environment (invited poster) (poster)
Eugene Ie, Chih-wei Hsu, Martin Mladenov, Sanmit Narvekar, Jing Conan Wang, Rui Wu, Vihan Jain, Craig Boutilier
RetailNet: Enhancing Retails of Perishable Products with Multiple Selling Strategies via Pair-Wise Multi-Q Learning
Xiyao Ma, Fan Lu, Xiajun Pan, Yanlin Zhou, Xiaolin Li
Syed Arbab Mohd Shihab, Caleb Logemann, Deepak-George Thomas, Peng Wei
A Reinforcement Learning Approach for Joint Replenishment Policy in Multi-Product Inventory System
Hiroshi Suetsugu, Yoshiaki Narusue, Hiroyuki Morikawa
Reinforcement learning in maintenance of civil infrastructures
Shiyin Wei, Hui Li
Applications: Chemistry
Optimizing 3D structure of H2O molecule using DDPG (poster)
Soo Kyung Kim, Peggy Li, Joanne Taery Kim, Piyush Karande, Yong Han
Chemical Synthesis Planning via Reinforcement Learning and its Implications for Drug Discovery (invited poster)
Marwin Segler
Applications: Computer Systems
Reinforcement Learning and Adaptive Sampling for Optimized DNN Compilation
Byung Hoon Ahn, Prannoy Pilligundla, Hadi Esmaeilzadeh
SmartChoices: Hybridizing Programming and Machine Learning
Victor Carbune, Thierry Coppey, Alexander Daryin, Thomas Deselaers, Nikhil Sarda, Jay Yagnik
A Deep Reinforcement Learning Perspective on Internet Congestion Control (ICML 2019)
Nathan Jay, Noga H. Rotman, Brighten Godfrey, Michael Schapira, Aviv Tamar
Staying up to Date with Online Content Changes Using Reinforcement Learning for Scheduling
Andrey Kolobov, Yuval Peres, Cheng Lu, Eric Horvitz
Meta-reasoning in Modular Software Systems via Reinforcement Learning (invited poster) (poster)
Aditya Modi, Debadeepta Dey, Alekh Agarwal, Adith Swaminathan, Besmira Nushi, Sean Andrist, Eric Horvitz
Applications: Energy
Deep Reinforcement Learning for Continuous Power Allocation in Flexible High Throughput Satellites (poster)
Juan Jose Garau Luis, Markus Guerster, Edward Crawley, Bruce Cameron
Applications: Healthcare
Dynamic Measurement Scheduling for Event Forecasting using Deep RL (invited poster, ICML 2019)
Chun-Hao Chang, Mingjie Mai, Anna Goldenberg
Optimizing Sequential Medical Treatments with Auto-Encoding Heuristic Search in POMDPs (invited poster)
Luchen Li, Matthieu Komorowski and Aldo A. Faisal
Personalized HeartSteps: A Reinforcement Learning Algorithm for Optimizing Physical Activity (invited poster)
Peng Liao, Kristjan Greenwald, Predrag Klasnja and Susan Murphy
Crowdsourcing Reinforcement Learning to Optimize Knee Replacement Pathway
Hao Lu, Mengdi Wang
Reinforcement Learning for Blood Glucose Control: Challenges and Opportunities (poster)
Ian Fox, Jenna Wiens
Intelligent Pooling in Thompson Sampling for Rapid Personalization in Mobile Health
Sabina Tomkins, Peng Liao, Serena Yeung, Predrag Klasnja, Susan Murphy
Applications: Robotics/Manufacture
Gerrit Schoettler, Ashvin Nair, Jianlan Luo, Shikhar Bahl, Juan Aparicio Ojea, Eugen Solowjow, Sergey Levine
Algorithm & Theory
Improving the Generalization of Visual Navigation Policies using Invariance Regularization (poster)
Michel Aractingi, Christopher Dance, Julien Perez, Tomi Silander
Curious iLQR: Resolving Uncertainty in Model-based RL
Sarah Bechtle, Akshara Rai, Yixin Lin, Ludovic Righetti, Franziska Meier
Deep Knowledge Based Agent: Learning to do tasks by self-thinking about imaginary worlds
Ali Davody
P3O: Policy-on Policy-off Policy Optimization
Rasool Fakoor, Pratik Chaudhari, Alexander J. Smola
Contextual Markov Decision Processes using Generalized Linear Models (poster)
Aditya Modi, Ambuj Tewari
Himanshu Sahni, Toby Buckley, Pieter Abbeel, Ilya Kuzovkin
Fast Efficient Hyperparameter Tuning for Policy Gradients (poster)
Supratik Paul, Vitaly Kurin, Shimon Whiteson
Q-Learning for Continuous Actions with Cross-Entropy Guided Policies (poster)
Riley Simmons-Edler, Ben Eisner, Eric Mitchell, Sebastian Seung, Daniel Lee
R-MADDPG for Partially Observable Environments and Limited Communication
Rose E. Wang, Michael Everett, Jonathan P. How
Algorithm & Theory : Bandits
A Contextual Bandit Bake-off (invited poster)
Alberto Bietti, Alekh Agarwal, John Langford
Optimal Exploitation of Clustering and History Information in Multi-armed Bandit Problem
Djallel Bouneffouf, Srinivasan Parthasarathy, Horst Samulowitz, Martin Wistuba
Distribution-dependent and Time-uniform Bounds for Piecewise i.i.d Bandits (poster)
Subhojyoti Mukherjee, Odalric Maillard
Multinomial Logit Contextual Bandits
Min-hwan Oh, Garud Iyengar
Algorithm & Theory : Off-Policy Learning
Off-Policy Evaluation via Off-Policy Classification (poster)
Alex Irpan, Kanishka Rao, Konstantinos Bousmalis, Chris Harris, Julian Ibarz, Sergey Levine
Off-Policy Policy Gradient with State Distribution Correction (invited poster)
Yao Liu, Adith Swaminathan, Alekh Agarwal, Emma Brunskill
DualDICE: Efficient Estimation of Off-Policy Stationary Distribution Corrections (poster)
Ofir Nachum, Yinlam Chow, Bo Dai, Lihong Li
Algorithm & Theory : Safety
Lyapunov-based Safe Policy Optimization for Continuous Control
Yinlam Chow, Ofir Nachum, Aleksandra Faust, Mohammad Ghavamzadeh, Edgar Duenez-Guzman
Distributionally Robust Reinforcement Learning (safety)
Elena Smirnova, Elvis Dohmatob, Jérémie Mary
Call For Paper
The main goals of the workshop are to: (1) have experts share their successful stories of applying RL to real-world problems; and (2) identify research sub-areas critical for real-world applications such as reliable evaluation, benchmarking, and safety/robustness.
We invite paper submissions successfully applying RL and relevant algorithms to real life RL applications by addressing relevant RL issues. Under the central theme of making RL work in real life scenarios, no further constraints are set, to facilitate open discussions and to foster the most potential creativity and imagination from the community. We will prioritize work that propose interesting and impactful contributions. Our technical topics of interest are general, including but not limited to concrete topics below:
- RL and relevant algorithms: value-based, policy-based, model-free, model-based, online, offline, on-policy, off-policy, hierarchical, multi-agent, relational, multi-armed bandit, (linear, nonlinear, deep/neural, symbolic) representation learning, unsupervised learning, self-supervised learning, transfer learning, sim-to-real, multi-task learning, meta-learning, imitation learning, continual learning, causal inference, and reasoning;
- Issues: generalization, deadly triad, sample/time/space efficiency, exploration vs. exploitation, reward specification, stability, convergence, scalability, model-based learning (model validation and model error estimation), prior knowledge, safety, interpretability, reproducibility, hyper-parameters tuning, and boilerplate code;
- Applications: recommender systems, advertisements, conversational AI, business, finance, healthcare, education, robotics, autonomous driving, transportation, energy, chemical synthesis, drug design, industry control, drawing, music, and other problems in science, engineering and arts.
We warmly welcome position papers.
We invite unpublished submissions up to 8 pages excluding references, in PDF format using the ICML 2019 template and style guidelines. We are open to papers currently under review at other venues. Submission is single-blind. All accepted papers will be presented as posters, and a few of them will be selected for spotlight presentations. There will be no proceedings for this workshop. However, accepted contributions will be made available on the workshop website, unless authors opt out. The submission website is: https://openreview.net/group?id=ICML.cc/2019/Workshop/RL4RealLife.
Important dates:
- Submission deadline: May 5, 2019 (23:59 EST)
- Author notification: May 28, 2019
- Final submission: June 3, 2019
Info for Posters:
All posters will be presented inside the room of the workshop. There are no poster boards at workshops. Posters are taped to the wall. Posters should be on light weight paper, not laminated. Please make posters 36W x 48H inches or 90 x 122 cm. Please follow the specification on ICML website. ("Please ask your presenters to make their posters posters 24W x 36H inches or 61 x 91 cm.")
Final version:
Style files for our workshop (customized from that of ICML 2019).
Instruction: Based on ICML 2019 submission style files, 1) change \usepackage{icml2019} to \usepackage[accepted]{icml2019} in your .tex file for the final style; and 2) use our customized icml2019.sty file for the foot note of our workshop.
Program Committee Members
David Abel (Brown University)
Omid Ardakanian (University of Alberta)
Kamyar Azizzadenesheli (Purdue University)
Justin Basilico (Netflix)
Victor Carbune (Google Research)
Minmin Chen (Google Research)
Yinlam Chow (Google Research)
Bo Dai (Google Research)
Christoph Dann (Carnegie Mellon University)
Gabriel Dulac-Arnold (Google Research)
Ben Eisner (Samsung Research)
Rasool Fakoor (Amazon)
Xiaofeng Gao (University of California, Los Angeles)
Todd Hester (DeepMind)
Nikos Karampatziakis (Microsoft)
Soo Kyung Kim (Lawrence Livermore National Labs)
Andrey Kolobov (Microsoft)
Branislav Kveton (Google Research)
Minhae Kwon (Rice University)
Peng Liao (Harvard University)
Xin Liu (University of California, Davis)
Yao Liu (Stanford University)
Hongzi Mao (Massachusetts Institute of Technology)
Ofir Nachum (Google Research)
Zhiwei Tony Qin (Didi Chuxing)
Marwin Segler (BenevolentAI)
Jun Wang (University College London)
Zhipeng Wang (Apple)
Peng Wei (Iowa State University)
Hengshuai Yao (Huawei Technologies)
Kai Yu (Shanghai Jiao Tong University)
Yang Yu (Nanjing University)
Quan Yuan (Inspir.AI)
Shangtong Zhang (University of Oxford)
Xiangyu Zhao (Michigan State University)
Contact/Communication
Email: rl4reallife@gmail.com Slack Twitter: #RL4RealLife