AAAI 2023
Reinforcement Learning Ready for Production Workshop

 Towards bringing reinforcement learning to production 

 

Summary

Reinforcement learning aims to solve sequential decision problems through observations and interactions with the environment, human demonstrations or logged data from historical interactions. It has demonstrated great success in simulated environments and games with clear rules and boundaries, such as Atari Games, Go, Shogi, Poker and many robotic simulations. Such success in these highly complex environments grants promises that reinforcement learning can scale to real world scenarios. Recently, we observe an increasing trend of interest in bringing reinforcement learning to production and optimize beyond immediate consequences in real-life problems. The 1st Reinforcement Learning Ready for Production workshop, held at AAAI 2023, focuses on understanding reinforcement learning trends and algorithmic developments that bridge the gap between theoretical reinforcement learning and production environments. 

Schedule (2/14/2023 Washington D.C., Eastern Time)

Room 146B at the AAAI Conference Venue, the Walter E. Washington Convention Center in Washington DC

Virtual Participation (Registration Required): https://underline.io/events/390/sessions?eventSessionId=+14197

Early Social Time

9:30 - 9:45 AM

Opening Remarks

9:45 - 10:00 AM

Zheqing (Bill) Zhu

Meta AI / Stanford University

Trials and Tribulations: Ensuring the Oralytics RL Algorithm is Ready for Production!

10:00 - 11:00 AM

Abstract: Dental disease continues to be one of the most prevalent chronic diseases in the United States. Recent advances in digital technology, electric toothbrushes and smartphones, offer much potential for promoting  quality tooth-brushing in real-time, real-world settings.  Sensor data from electronic toothbrushes and smartphones provide data & matching mobile apps deliver on-demand feedback and educational information. Behavioral scientists have developed a variety of prompts to encourage engagement in the app and the associated toothbrushing behaviors.  Here we describe the multiple stages of development of an online reinforcement learning algorithm for learning which type of prompt/ no prompt, in which state, is most effective in promoting quality tooth-brushing. Challenges in designing and testing an RL algorithm in these real-life health settings include developing a testbed on incomplete data,  ensuring the RL algorithm can learn and run stably under a variety of constraints, and accounting for the complexity of the environment, e.g., a lack of accurate mechanistic models for the user dynamics. Throughout, we apply a generalization of PCS (Predictability, Computability, Stability) framework, a data science framework that incorporates best practices from machine learning and statistics in supervised learning (Yu and Kumbier, 2020), to design and test the Oralytics RL algorithm.  The Oralytics RL algorithm is currently being piloted tested with the goal for the clinical trial to start in March, 2023.

Bio: Susan Murphy’s research focuses on improving sequential, individualized, decision making in digital health. She developed the micro-randomized trial for use in constructing digital health interventions; this trial design is in use across a broad range of health-related areas. Her lab works on online learning algorithms for developing personalized digital health interventions.  Dr. Murphy is a member of the National Academy of Sciences and of the National Academy of Medicine, both of the US National Academies.  In 2013 she was awarded a MacArthur Fellowship for her work on experimental designs to inform sequential decision making.  She is a Fellow of the College on Problems in Drug Dependence, Past-President of Institute of Mathematical Statistics, Past-President of the Bernoulli Society and a former editor of the Annals of Statistics. 

Susan Murphy

Harvard University

Contributed Talks (6 Lightning Talks by Highlight Accepted Papers)

11:00 AM - 12:30 PM


Lunch

12:30 - 13:30 PM


RL Executive Panel (Ordered by Last Name)

13:30 - 14:30 PM

Emma Brunskill

Stanford University

Sergey Levine

University of California, Berkeley / Google

Susan Murphy

Harvard University

Dorsa Sadigh

Stanford University

Benjamin Van Roy 

Stanford University / DeepMind

Reinforcement Learning for Interactive Robotics

14:30 - 15:30 PM

Abstract: There have been significant advances in the field of robot learning in the past decade. This includes approaches that integrate online and offline reinforcement learning for robotics applications. However, many challenges still remain when considering how robot learning can advance interactive agents, i.e., robots that collaborate with humans such as effectively exploring the space of partner strategies, training with humans in the loop, and designing reward functions. In this talk, I will be discussing the role of learning representations for robots that interact with humans and robots that interactively learn from humans through a few different vignettes. I will first discuss how the notion of latent strategies — low dimensional representations sufficient for capturing non-stationary interactions — can enable reducing the search space of reinforcement learning agents that will effectively coordinate and collaborate with other non-stationary partners such as humans. In addition, we can go beyond partner modeling, and influence or stabilize these latent strategies. These stabilizations can be effective in other coordination settings such as bimanual manipulation. Finally, I will discuss how we can actively learn reward functions that capture human preferences using data-efficient techniques that also allow for expressive neural models. I will end the talk with some ongoing and future directions for leveraging  large language models that can enable designing human preference reward functions.

Bio: Dorsa Sadigh is an assistant professor in Computer Science and Electrical Engineering at Stanford University.  Her research interests lie in the intersection of robotics, learning, and control theory. Specifically, she is interested in developing algorithms for safe and adaptive human-robot and human-AI interaction. Dorsa received her doctoral degree in Electrical Engineering and Computer Sciences (EECS) from UC Berkeley in 2017, and received her bachelor’s degree in EECS from UC Berkeley in 2012.  She is awarded the Sloan Fellowship, NSF CAREER, ONR Young Investigator Award, AFOSR Young Investigator Award, DARPA Young Faculty Award, Okawa Foundation Fellowship, MIT TR35, and the IEEE RAS Early Academic Career Award.

Dorsa Sadigh

Stanford University

Break

15:30 - 16:00 PM


Challenges in Using Reinforcement Learning for Societal Benefit

16:00 - 17:00 PM

Abstract: Reinforcement learning has had many exciting advances, but most of those have focused on simulated settings, where samples are cheap. In settings involving people, like healthcare or education, samples are often expensive and there are many important multiple modeling choices. In this talk I will discuss some of my lab’s recent work on this for more realistic, robust RL, motivated by challenges in education, healthcare and other applications aimed at societal benefit. 

Bio: Emma Brunskill is an associate (tenured) professor in the Computer Science Department at Stanford University whose lab is part of the Stanford AI Lab, the Stanford Statistical ML group, and AI Safety @Stanford. Brunskill's work has been honored by early faculty career awards (National Science Foundation, Office of Naval Research, Microsoft Research (1 of 7 worldwide)) and her lab has received several best research paper nominations (CHI, EDMx3) and awards (UAI, RLDMx2, ITS). 

Emma Brunskill

Stanford University

Deep Reinforcement Learning with Real-World Data

17:00 - 18:00 PM

Abstract: Machine learning systems are useful in the real world insofar as they can make decisions that lead to the outcomes that we want. Whether we want a system to drive an autonomous car or an image recognition engine to tag our friends in photographs on social media, predictions and outputs of machine learning systems lead to consequences, and we would like them to make the decisions that lead to the consequences that we prefer. This makes it natural to think about machine learning frameworks that directly reason about decisions and their consequences -- namely, reinforcement learning. However, reconciling reinforcement learning with the data-driven paradigm under which most modern machine learning systems operate is difficult, because reinforcement learning in its classic form is an active and online learning paradigm. Can we get the best of both worlds -- the data-driven approach in supervised or unsupervised learning that can utilize large, previously collected datasets, and the decision making formalism of reinforcement learning that enables reasoning about decisions and their consequences? In this talk, I will discuss how offline reinforcement learning can make it possible, and discuss how offline RL can enable effective pretraining from suboptimal multi-task data, broad generalization in real-world domains, and compelling applications in settings such as robotics and dialogue systems.

Bio: Sergey Levine received a BS and MS in Computer Science from Stanford University in 2009, and a Ph.D. in Computer Science from Stanford University in 2014. He joined the faculty of the Department of Electrical Engineering and Computer Sciences at UC Berkeley in fall 2016. His work focuses on machine learning for decision making and control, with an emphasis on deep learning and reinforcement learning algorithms. Applications of his work include autonomous robots and vehicles, as well as applications in other decision-making domains. His research includes developing algorithms for end-to-end training of deep neural network policies that combine perception and control, scalable algorithms for inverse reinforcement learning, deep reinforcement learning algorithms, and more.

Sergey Levine

University of California, Berkeley / Google

Highlight Accepted Papers:

Organizers

Zheqing (Bill) Zhu

Meta AI / Stanford University

Timothy Mann

DeepMind

Haque Ishfaq

McGill University

Doina Precup

McGill University/ DeepMind

Shie Mannor

Technion / Nvidia

Call for Contribution

Submit to: https://cmt3.research.microsoft.com/RLRP2023/

Submission Deadline: November 04, 2022, AoE

Acceptance Notification Deadline: November 18, 2022, AoE  We will be sending out notifications on November 19, 2022 due to CMT submissions side technical issue. We are sorry about the inconvenience.

Contact: rlready4prod@gmail.com 

Our goal is to create a space for discussion for the community on topics regarding recent advances and new algorithms ready for deployment in production. We encourage contributions including but not limited to the following topics:

• Efficient reinforcement learning algorithms that optimize sample complexity in real-world environments

• Counterfactual evaluation for reinforcement learning algorithms

• Reinforcement learning research for Recommendation Systems, Robotics, Optimization and many more industry fields that enables productionalization of reinforcement learning.

• Novel applications for reinforcement learning in the internet, robotics, chip design, supply chain and many more industry fields. Outcomes from these applications should come from either production environments or well-recognized high-fidelity simulators (excluding standard OpenAI Gym and standard Atari Games)

Format of workshop: This workshop will be a 1 day workshop. We have confirmed 7 distinguished reinforcement learning researchers and practitioners to speak or participate in panel for this workshop (listed in the next section, some have scheduling pending). We will have a reinforcement learning foundations panel, and talks on reinforcement learning advancements, applications in recommender systems, robotics, medical systems, and production A/B experiments. We anticipate about 4 hours of hosted content from the workshop and 1.5 hours of poster sessions and 1.5 hours of contributed talks, which will go from 10 am to 5 pm on the workshop day. Each talk or panel will be 40 minutes of content with 10 minutes of Q&A session.

Submission requirements: We expect 6-8 pages for full papers excluding reference and supplement. Note that the submissions are double-blind and please anonymize your paper before submission. Paper template: http://www.aaai.org/Publications/Templates/AuthorKit22.zip

Program Committee

We would like to thank the following people for their effort in making the RL Ready for Production Workshop a success!

Questions?

Contact  rlready4prod@gmail.com to get more information on the Workshop!