RL Ready 4 Prod Workshop

AAAI 2023
Reinforcement Learning Ready for Production Workshop

Towards bringing reinforcement learning to production

Summary

Reinforcement learning aims to solve sequential decision problems through observations and interactions with the environment, human demonstrations or logged data from historical interactions. It has demonstrated great success in simulated environments and games with clear rules and boundaries, such as Atari Games, Go, Shogi, Poker and many robotic simulations. Such success in these highly complex environments grants promises that reinforcement learning can scale to real world scenarios. Recently, we observe an increasing trend of interest in bringing reinforcement learning to production and optimize beyond immediate consequences in real-life problems. The 1st Reinforcement Learning Ready for Production workshop, held at AAAI 2023, focuses on understanding reinforcement learning trends and algorithmic developments that bridge the gap between theoretical reinforcement learning and production environments.

Schedule (2/14/2023 Washington D.C., Eastern Time)

Room 146B at the AAAI Conference Venue, the Walter E. Washington Convention Center in Washington DC

Virtual Participation (Registration Required): https://underline.io/events/390/sessions?eventSessionId=+14197

Early Social Time

9:30 - 9:45 AM

Opening Remarks

9:45 - 10:00 AM

Zheqing (Bill) Zhu

Meta AI / Stanford University

Trials and Tribulations: Ensuring the Oralytics RL Algorithm is Ready for Production!

10:00 - 11:00 AM

Abstract: Dental disease continues to be one of the most prevalent chronic diseases in the United States. Recent advances in digital technology, electric toothbrushes and smartphones, offer much potential for promoting quality tooth-brushing in real-time, real-world settings. Sensor data from electronic toothbrushes and smartphones provide data & matching mobile apps deliver on-demand feedback and educational information. Behavioral scientists have developed a variety of prompts to encourage engagement in the app and the associated toothbrushing behaviors. Here we describe the multiple stages of development of an online reinforcement learning algorithm for learning which type of prompt/ no prompt, in which state, is most effective in promoting quality tooth-brushing. Challenges in designing and testing an RL algorithm in these real-life health settings include developing a testbed on incomplete data, ensuring the RL algorithm can learn and run stably under a variety of constraints, and accounting for the complexity of the environment, e.g., a lack of accurate mechanistic models for the user dynamics. Throughout, we apply a generalization of PCS (Predictability, Computability, Stability) framework, a data science framework that incorporates best practices from machine learning and statistics in supervised learning (Yu and Kumbier, 2020), to design and test the Oralytics RL algorithm. The Oralytics RL algorithm is currently being piloted tested with the goal for the clinical trial to start in March, 2023.

Bio: Susan Murphy’s research focuses on improving sequential, individualized, decision making in digital health. She developed the micro-randomized trial for use in constructing digital health interventions; this trial design is in use across a broad range of health-related areas. Her lab works on online learning algorithms for developing personalized digital health interventions. Dr. Murphy is a member of the National Academy of Sciences and of the National Academy of Medicine, both of the US National Academies. In 2013 she was awarded a MacArthur Fellowship for her work on experimental designs to inform sequential decision making. She is a Fellow of the College on Problems in Drug Dependence, Past-President of Institute of Mathematical Statistics, Past-President of the Bernoulli Society and a former editor of the Annals of Statistics.

Susan Murphy

Harvard University

Contributed Talks (6 Lightning Talks by Highlight Accepted Papers)

11:00 AM - 12:30 PM

Grounding Object Relations in Language-Conditioned Robotic Manipulation with Semantic-Spatial Reasoning, Qian Luo
Data-Efficient Pipeline for Offline Reinforcement Learning with Limited Data, Allen Nie
Stein Information Directed Sampling for Efficient Exploration in Model-Based Reinforcement Learning, Souradip Chakraborty
Provable Reset-free Reinforcement Learning by No-Regret Reduction, Hoai-An Nguyen
ReProHRL: Towards Multi-Goal Navigation in the Real World using Hierarchical Agents, Tejaswini Manjunath
MARLIN: Soft Actor-Critic based Reinforcement Learning for Congestion Control in Real Networks, Raffaele Galliera

Lunch

12:30 - 13:30 PM

RL Executive Panel (Ordered by Last Name)

13:30 - 14:30 PM

Emma Brunskill

Stanford University

Sergey Levine

University of California, Berkeley / Google

Susan Murphy

Harvard University

Dorsa Sadigh

Stanford University

Benjamin Van Roy

Stanford University / DeepMind

Reinforcement Learning for Interactive Robotics

14:30 - 15:30 PM

Abstract: There have been significant advances in the field of robot learning in the past decade. This includes approaches that integrate online and offline reinforcement learning for robotics applications. However, many challenges still remain when considering how robot learning can advance interactive agents, i.e., robots that collaborate with humans such as effectively exploring the space of partner strategies, training with humans in the loop, and designing reward functions. In this talk, I will be discussing the role of learning representations for robots that interact with humans and robots that interactively learn from humans through a few different vignettes. I will first discuss how the notion of latent strategies — low dimensional representations sufficient for capturing non-stationary interactions — can enable reducing the search space of reinforcement learning agents that will effectively coordinate and collaborate with other non-stationary partners such as humans. In addition, we can go beyond partner modeling, and influence or stabilize these latent strategies. These stabilizations can be effective in other coordination settings such as bimanual manipulation. Finally, I will discuss how we can actively learn reward functions that capture human preferences using data-efficient techniques that also allow for expressive neural models. I will end the talk with some ongoing and future directions for leveraging large language models that can enable designing human preference reward functions.

Bio: Dorsa Sadigh is an assistant professor in Computer Science and Electrical Engineering at Stanford University. Her research interests lie in the intersection of robotics, learning, and control theory. Specifically, she is interested in developing algorithms for safe and adaptive human-robot and human-AI interaction. Dorsa received her doctoral degree in Electrical Engineering and Computer Sciences (EECS) from UC Berkeley in 2017, and received her bachelor’s degree in EECS from UC Berkeley in 2012. She is awarded the Sloan Fellowship, NSF CAREER, ONR Young Investigator Award, AFOSR Young Investigator Award, DARPA Young Faculty Award, Okawa Foundation Fellowship, MIT TR35, and the IEEE RAS Early Academic Career Award.

Dorsa Sadigh

Stanford University

Break

15:30 - 16:00 PM

Challenges in Using Reinforcement Learning for Societal Benefit

16:00 - 17:00 PM

Abstract: Reinforcement learning has had many exciting advances, but most of those have focused on simulated settings, where samples are cheap. In settings involving people, like healthcare or education, samples are often expensive and there are many important multiple modeling choices. In this talk I will discuss some of my lab’s recent work on this for more realistic, robust RL, motivated by challenges in education, healthcare and other applications aimed at societal benefit.

Bio: Emma Brunskill is an associate (tenured) professor in the Computer Science Department at Stanford University whose lab is part of the Stanford AI Lab, the Stanford Statistical ML group, and AI Safety @Stanford. Brunskill's work has been honored by early faculty career awards (National Science Foundation, Office of Naval Research, Microsoft Research (1 of 7 worldwide)) and her lab has received several best research paper nominations (CHI, EDMx3) and awards (UAI, RLDMx2, ITS).

Emma Brunskill

Stanford University

Deep Reinforcement Learning with Real-World Data

17:00 - 18:00 PM

Abstract: Machine learning systems are useful in the real world insofar as they can make decisions that lead to the outcomes that we want. Whether we want a system to drive an autonomous car or an image recognition engine to tag our friends in photographs on social media, predictions and outputs of machine learning systems lead to consequences, and we would like them to make the decisions that lead to the consequences that we prefer. This makes it natural to think about machine learning frameworks that directly reason about decisions and their consequences -- namely, reinforcement learning. However, reconciling reinforcement learning with the data-driven paradigm under which most modern machine learning systems operate is difficult, because reinforcement learning in its classic form is an active and online learning paradigm. Can we get the best of both worlds -- the data-driven approach in supervised or unsupervised learning that can utilize large, previously collected datasets, and the decision making formalism of reinforcement learning that enables reasoning about decisions and their consequences? In this talk, I will discuss how offline reinforcement learning can make it possible, and discuss how offline RL can enable effective pretraining from suboptimal multi-task data, broad generalization in real-world domains, and compelling applications in settings such as robotics and dialogue systems.

Bio: Sergey Levine received a BS and MS in Computer Science from Stanford University in 2009, and a Ph.D. in Computer Science from Stanford University in 2014. He joined the faculty of the Department of Electrical Engineering and Computer Sciences at UC Berkeley in fall 2016. His work focuses on machine learning for decision making and control, with an emphasis on deep learning and reinforcement learning algorithms. Applications of his work include autonomous robots and vehicles, as well as applications in other decision-making domains. His research includes developing algorithms for end-to-end training of deep neural network policies that combine perception and control, scalable algorithms for inverse reinforcement learning, deep reinforcement learning algorithms, and more.

Sergey Levine

University of California, Berkeley / Google

Highlight Accepted Papers:

Data-Efficient Pipeline for Offline Reinforcement Learning with Limited Data
Allen Nie (Stanford University)*; Yannis Flet-Berliac (Stanford University); Deon Rich (Stanford University); William AT Steenbergen (Stanford University); Emma Brunskill (Stanford University)

Stein Information Directed Sampling for Efficient Exploration in Model-Based Reinforcement Learning
Souradip Chakraborty (University of Maryland, College Park)*; Amrit Singh Bedi (University of Maryland); Alec Koppel (US Army Research laboratory); Furong Huang (University of Maryland); Dinesh Manocha (University of Maryland at College Park)
Provable Reset-free Reinforcement Learning by No-Regret Reduction
Hoai-An Nguyen (Rutgers University)*; Ching-An Cheng (Microsoft)
ReProHRL: Towards Multi-Goal Navigation in the Real World using Hierarchical Agents
Tejaswini Manjunath (University of Maryland Baltimore County)*; Mozhgan Navardi (University of Maryland, Baltimore County); Prakhar Dixit (University of Maryland Baltimore County); Bharat Prakash (University of Maryland Baltimore County); Tinoosh Mohseninn (University of Maryland Baltimore County)
Grounding Object Relations in Language-Conditioned Robotic Manipulation with Semantic-Spatial Reasoning
Qian Luo (Shanghai Qi Zhi Institute)*; Yunfei Li (Tsinghua University); Yi Wu (Tsinghua University)
MARLIN: Soft Actor-Critic based Reinforcement Learning for Congestion Control in Real Networks
Raffaele Galliera (The University of West Florida)*; Alessandro Morelli (Florida Institute for Human and Machine Cognition); Roberto Fronteddu (Florida Institute for Human and Machine Cognition); Niranjan Suri (Florida Institute for Human and Machine Cognition)

Organizers

Zheqing (Bill) Zhu

Meta AI / Stanford University

Meta AI

DeepMind

McGill University

Lyft

McGill University/ DeepMind

Shie Mannor

Technion / Nvidia

Call for Contribution

Submit to: https://cmt3.research.microsoft.com/RLRP2023/

Submission Deadline: November 04, 2022, AoE

Acceptance Notification Deadline: November 18, 2022, AoE We will be sending out notifications on November 19, 2022 due to CMT submissions side technical issue. We are sorry about the inconvenience.

Contact: rlready4prod@gmail.com

Our goal is to create a space for discussion for the community on topics regarding recent advances and new algorithms ready for deployment in production. We encourage contributions including but not limited to the following topics:

• Efficient reinforcement learning algorithms that optimize sample complexity in real-world environments

• Counterfactual evaluation for reinforcement learning algorithms

• Reinforcement learning research for Recommendation Systems, Robotics, Optimization and many more industry fields that enables productionalization of reinforcement learning.

• Novel applications for reinforcement learning in the internet, robotics, chip design, supply chain and many more industry fields. Outcomes from these applications should come from either production environments or well-recognized high-fidelity simulators (excluding standard OpenAI Gym and standard Atari Games)

Format of workshop: This workshop will be a 1 day workshop. We have confirmed 7 distinguished reinforcement learning researchers and practitioners to speak or participate in panel for this workshop (listed in the next section, some have scheduling pending). We will have a reinforcement learning foundations panel, and talks on reinforcement learning advancements, applications in recommender systems, robotics, medical systems, and production A/B experiments. We anticipate about 4 hours of hosted content from the workshop and 1.5 hours of poster sessions and 1.5 hours of contributed talks, which will go from 10 am to 5 pm on the workshop day. Each talk or panel will be 40 minutes of content with 10 minutes of Q&A session.

Submission requirements: We expect 6-8 pages for full papers excluding reference and supplement. Note that the submissions are double-blind and please anonymize your paper before submission. Paper template: http://www.aaai.org/Publications/Templates/AuthorKit22.zip

Program Committee

We would like to thank the following people for their effort in making the RL Ready for Production Workshop a success!

Questions?

Contact rlready4prod@gmail.com to get more information on the Workshop!

Page updated

Google Sites

Report abuse

AAAI 2023Reinforcement Learning Ready for Production Workshop

Summary

Schedule (2/14/2023 Washington D.C., Eastern Time)

Early Social Time

Opening Remarks

Trials and Tribulations: Ensuring the Oralytics RL Algorithm is Ready for Production!

Contributed Talks (6 Lightning Talks by Highlight Accepted Papers)

Lunch

RL Executive Panel (Ordered by Last Name)

Reinforcement Learning for Interactive Robotics

Break

Challenges in Using Reinforcement Learning for Societal Benefit

Deep Reinforcement Learning with Real-World Data

Highlight Accepted Papers:

Organizers

Call for Contribution

Program Committee

Questions?

AAAI 2023
Reinforcement Learning Ready for Production Workshop