Synergy of Reinforcement Learning and Large Language Models (RL+LLMs) @ AAAI 2024

Large Language Models (LLMs) took us by storm with the launch of ChatGPT and GPT-4, following the development of BERT, T5, and GPT. ChatGPT and many LLMs adopted RL from human feedback (RLHF) for human alignment, and they excel at several NLP tasks out of the box such as machine translation, summarization, and much more. At the same time, RL has had remarkable achievements like playing Go, Starcaft II, and Gran Turismo. Beyond games, RL helped with magnetic control of tokamak plasmas, navigating stratospheric balloons, matrix multiplication and sorting. We believe there is a huge untapped potential for the marriage of these two fields. In particular, LLMs can further benefit from the RL framework for:

Planning: RL can empower LLMs to evaluate and optimize sequences of actions over time, enabling more coherent and effective planning. By receiving feedback on their planning decisions, LLMs can iteratively refine their strategies to excel in task-oriented scenarios. Consequently, this continuous feedback loop ensures that LLMs become progressively adept at handling complex, multi-step tasks and user interactions, leading to more goal-directed interactions.
Exploration: The RL framework can facilitate adaptive exploration in LLMs, allowing them to weigh the benefits of generating an immediate response versus seeking additional user input. By constantly adjusting to feedback, LLMs can optimize this trade-off, ensuring they query the user when ambiguity exists and confidently generate answers when sufficient information is available. This iterative learning process ensures that LLMs become increasingly proficient at navigating user interactions, striking a balance between proactive responses and seeking clarifications.
Personalization: Using RL, LLMs can dynamically adapt to individual user preferences and behaviors, honing their outputs to align with specific user needs over time. By analyzing feedback and rewards tied to the user satisfaction, LLMs can iteratively refine their responses, ensuring they resonate more closely with individual users. This continuous adaptation allows LLMs to deliver highly personalized interactions, fostering a more tailored and engaging user experience.

While RL community started to bring LLM advances on their side (e.g. Decision Transformer, Trajectory Transformer), LLMs provide unparalleled opportunities to enhance RL further:

Rich Representation: LLMs can help RL by encoding intricate environmental nuances into comprehensive state representations, addressing the challenges of non-Markovian situations where current states may not hold all the relevant information. Furthermore, leveraging their vast knowledge, LLMs can equip RL agents with sparse data situations boosting zero/few-shot learning.
Explainability: LLMs can serve as interpreters for RL agents, transforming complex policy decisions into intuitive, human-readable narratives. By distilling the underlying logic and rationale of an agent's actions, LLMs make it easier for users to trust and collaborate with AI systems. Furthermore, LLMs facilitate a two-way dialogue, allowing humans to query and understand the motivations, strategies, and potential future actions of RL agents in dynamic environments.
Task Decomposition: LLMs can aid RL agents in task decomposition by breaking down high-level goals into more manageable sub-tasks using their deep semantic understanding. LLMs can provide structured insights and hierarchies, allowing RL agents to approach problems in a modular fashion and optimizing solutions for individual components.

The goal of our workshop is to bring together RL and LLM communities to facilitate cross-pollination. We will discuss possible opportunities including but not limited to the above areas. Furthermore we will have practitioners share their success stories of working on RL&LLMs problems, and the insights gained from such applications.

Photos

⏰️ Schedule

8:30

8:50

9:00

9:45

10:30

11:00

11:45

12:00

12:15

2:00

2:45

3:30

4:00

5:00

5:05

Poster Setup

Welcome [slides]

[Invited Talk] Charting New Pathways: Next-Gen LLM Reasoners, Asli Celikyilmaz @ Meta

[Invited Talk] From Bots to Buddies: Making Conversational Agents More Human-Like by Yun Nung (Vivian) Chen @ NTU [slides]

Posters / Coffee Break

[Invited Talk] Autonomous Agents in the Age of Large Language Models, Aleksandra Faust @ Google

[Contributed] Suspicion-Agent: Playing Imperfect Information Games with Theory of Mind Aware GPT-4, Haimei Zhao @ SydneyU

[Contributed] Lightning Talks - [slides]

🍔 Lunch / Poster

[Invited Talk] Glide-RL: A student/teacher framework for instruction learning and generalization, Matthew Taylor @ UAlberta

[Invited Talk] Language Model Alignment: Theory & Practice, Ahmad Beirami @ Google

Posters + coffee Break

[Panel] Invited Speakers + Michael Littman (Brown University) / Lihong Li (Amazon)

Closing

Poster Session

Invited Speakers/Panelists

Asli Celikyilmaz

Yun-Nung (Vivian) Chen

Nationl Taiwan University

Aleksandra Faust

Google Deepmind

Matthew Taylor

University of Alberta

Ahmad Beirami

Google Research

Michael Littman

Brown University

Lihong Li

Amazon

Questions for the Panels

We will have two panels during the workshop, one focusing on how RL can leverage LLMs capabilities and one disucssing how LLMs can benefit from RL. Please submit your questions or upvote existing questions here:

Submissions

We invite all researchers passionate about large language models and reinforcement learning to submit to this workshop. This dynamic crossroads represents the cutting edge of artificial intelligence, where the power of language models meets the finesse of reinforcement learning algorithms. While we discussed various key areas connecting the two realms, we want to emphasize that these topics serve as a starting point rather than a rigid framework. Whether you're exploring novel applications, pushing the boundaries of existing frameworks, or crafting innovative solutions, we welcome your submissions.

Format: 4-8 pages of AAAI format
Submission Page: Link
Review Policy: Double blind. Authors are responsible for anonymity of all their submission content
Dual-Submission: Our workshop is non-archival. Ongoging / unpublished work are welcomed. Yet published work at any venue will not be considered.

Important Dates

Submission Deadline: November 24th 2023, AoE
Author Notification: December 14th, 2023, AoE
Early Workshop Registeration: Jan 10th 2024
Camera Ready Deadline: January 15th 2024, AoE
Workshop: February 26th 2024

Attendance

Our workshop will be in Room 215 Vancouver Convention Centre – West Building
All presentations will be in-person.
In-person participation is encouraged but remote participation will be supported for talks.
All workshop attendees (in-person/remote) need to register for the workshop section of AAAI 2024.

Accepted Papers

Suspicion-Agent: Playing Imperfect Information Games with Theory of Mind Aware GPT-4 [pdf 📄]
Guo, Jiaxian*; Yang, Bo; Yoo, Paul; Lin, Yuchen; Iwasawa, Yusuke; Matsuo, Yutaka
Generate Subgoal Images before Act: Unlocking the Chain-of-Thought Reasoning in Diffusion Model for Robot Manipulation with Multimodal Prompts [pdf 📄]
Ni, Fei*; Hao, Jianye; Wu, Shiguang; Longxin, Kou; Liu, jiashun; Zheng, Yan; Wang, Bin; Zhuang, Yuzheng
CriticGPT: Multimodal LLM as a Critic for Robot Manipulation [pdf 📄]
Liu, Jinyi*; Yuan, Yifu; Hao, Jianye; Ni, Fei; Fu, Lingzhi; Chen, Yibin; Zheng, Yan
Decision Transformer With Tokenized Actions [pdf 📄]
Annett, Graham*; Andersen, Timothy
Reinforcement Learning for Optimizing RAG for Domain Chatbots [pdf 📄]
Kulkarni, Mandar*; Tangarajan, Praveen; Kim, Kyung; Trivedi, Anusua
Software Security Vulnerability Repair Using Reinforcement Learning with Large Language Models [pdf 📄]
Islam, Nafis Tanveer*; Karkevandi, Mohammad Bahrami; Najafirad, Peyman
Exploring Reinforcement Learning with Large Language Models for Enhancing Badminton Players' Strategies [pdf 📄]
Wang, Kuang-Da; Wang, Yung-Chien*; Chien, Yen-Che; Peng, Wen-Chih Chris; Bo Zhou, Hsieh; Yong En, Tian
DeLF: Designing Learning Environments with Foundation ModelsGuo, Jiaxian*; Yang, Bo; Yoo, Paul; Lin, Yuchen; Iwasawa, Yusuke; Matsuo, Yutaka [pdf 📄]
Afshar, Aida*; Li, Wenchao

Organizing Committee

Report abuse