Foundation Models for Decision Making @NeurIPS 2023

Hall E2, December 15, 2023 (Friday) @8:15 am

December 15, 2023 (Friday)

Foundation Models and Decision Making come together to solve complex tasks at scale.

Foundation models pretrained on diverse vision and language datasets have demonstrated exceptional capabilities in performing a wide range of downstream vision and language tasks. As foundation models are deployed in real-world applications such as dialogue, autonomous driving, healthcare, and robotics, they inevitably face new challenges such as learning from external feedback, adapting to different task modalities, and performing long-term reasoning and planning. Such challenges have traditionally been at the core of sequential decision making, encompassing areas such as reinforcement learning, imitation learning, planning, search, and optimal control. These research fields have traditionally focused on task-specific settings with limited prior knowledge, and yet there has been significant research progress in surpassing human performance in tasks like playing board games and Atari video games, as well as operating robots to complete navigation and manipulation tasks. However, since these methods generally learn to solve a specific task from scratch without broad knowledge from vision and language, they can struggle with generalization and sample efficiency.

Research in the intersection of foundation models and sequential decision making is gaining attention. Research in foundation models has expanded to address long-term reasoning and multiple model interactions, while researchers in sequential decision making are developing larger datasets and training larger-scale interactive agents. Further blurring the lines between the two fields, dialogue agents have been optimized by reinforcement learning with human feedback, and large pretrained vision-language models have been used as perception and reasoning components of embodied agents. Foundation models have also been adapted to interact with search engines, calculators, translators, simulators, and program interpreters. Despite these early successes, foundation models for decision making still faces many scientific questions and challenges that have not been addressed by existing work. Examples of questions that we hope to make progress towards answering through this workshop include:

Develop language model agents that can automatically learn to interact with humans, tools, the world, and each other in a scientific and principled way.
Derive sound, practical, and scalable algorithms similar to RLHF and MCTS for language and vision based decision making applications.
How to structure environments and tasks so that vision language foundation models can benefit traditional decision making applications in control, planning, and reinforcement learning?
Foundation models are trained on data without actions. How to overcome this limitation from both the dataset and modeling perspectives?

Goal of the workshop. The goal of this workshop is to bring together the sequential decision making community including planning, search, RL, and optimal control, together with the foundation models community in vision and language to confront the challenges in decision making at scale. The workshop will span high-level discussions on how foundation models and decision making can benefit each other when jointly considered and low-level algorithmic details of various decision making algorithms and vision-language architectures, which might lead to both opportunities or challenges. More specific topics will include but are not limited to:

Foundation model agents interacting with humans, computers, tools, simulators, physical world, and each other.
Rethinking the implementation, ecosystem, and model modularity of decision making agents under emerging technologies such as ChatGPT and language model plug-ins.
Applying foundation models to traditional decision making problems in control, planning, online / offline RL.
Learning multi-modal, multi-task, multi-environment, and generalist policies.
Long-horizon reasoning and planning in language models.
New evaluation protocols, benchmarks, datasets, and applications that apply foundation models to solve decision making problems.
Theoretical understanding of the roles foundation models play in decision making.

Invited Speakers

Percy Liang

Stanford

Ruslan Salakhutdinov

CMU

Jürgen Schmidhuber

KAUST

Russ Tedrake

MIT, TRI

Phillip Isola

MIT

Kristen Grauman

UT Austin, FAIRStanford

Chelsea Finn

Stanford

Xinyun Chen

Google DeepMind

Organizers

Sherry Yang

UC Berkeley, Google

Ofir Nachum

OpenAI

Yilun Du

MIT

Stephen McAleer

CMU

Igor Mordatch

Google

Jeannette Bohg

Stanford

Linxi (Jim) Fan

NVIDIA

Dale Schuurmans

University of Alberta, Google

Program Committee

• Cong Lu (University of Oxford)

• Robert Kirk (UCL)

• Yingchen Xu (UCL)

• Fangchen Liu (UC Berkeley)

• Ademi Adeniji (UC Berkeley)

• Yuqing Du (UC Berkeley)

• Jongmin Lee (UC Berkeley, KAIST)

• Zhang-wei Hong (MIT)

• Siddharth Karamcheti (Stanford)

• Allen Nie (Stanford)

• Sanjana Srivastava (Stanford)

• Younggyo Seo (UC Berkeley, KAIST)

• Nicklas Hansen (UCSD)

• Nagender Aneja (Perdue University)

• Victoriano Montesinos (FAR)

• Julian Yocum (MIT)

• Jongmin Lee (UC Berkeley)

• Jian Vora (Stanford)

• Zishun Yu (UIC)

• Katayoon Goshvadi (Google)

• Chengguang Xu (Northeastern)

• Yingjie Miao (Google)

• Ling Pan (HKUST)

• Yunshuang Li (UPenn)

• Alexander Kyimpopkin (UPenn)

• Sumedh Anand Sontakke (USC)

• Liang Chen (Google)

• Mrinal Mathur (Apple)

• Catherine Glossop (UC Berkeley)

• Andrew Melnik (Bielefeld University)

• Jingqing Ruan (CAS)

• Xinlu Zhang (UCSB)

• Mitsuhiko Nakamoto (UC Berkeley)

• Robert Kirk (UCL)

• Lixiang Li (Purdue)

• Tianwei Ni (MILA)

• Tianhao Wu (UC Berkeley)

• Arun Iyer (Microsoft)

• Bharat Prakash (UMBC)

• Youngchul Sung (UC Berkeley)

• Seohong Park (UC Berkeley)

• Fred Zhang (Yale)

• Zichen Zhang (Allen AI)

• Zuxin Liu (CMU)

• Bo Li (Sambanova Systems)

• Osbert Bastani (UPenn)

• Guanzhi Wang (Caltech)

• Yiwei Lyu (UMich)

• Zhihan Liu (Northwestern)

• Zirui Zhao (NUS)

• Jeonghye Kim (KAIST)

• Yunhao Yang (UT Austin)

• Luckeciano Carvalho Melo (Microsoft)

• Suvir Mirchandani (Stanford)

• Olivia Watkins (UC Berkeley)

• Rasool Fakoor (AWS)

• Jordi Orbay (Google)

• Xuefeng Liu (U Chicago)

• Dylan Hadfield-Menell (MIT)

• Ekdeep Singh Lubana (UMich)

• Ruizhe Shi (Tsinghua)

• Claudia D’Arpino (NVIDIA)

• Hiroki Furuta (University of Tokyo)

• Yusuke Iwasawa (University of Tokyo)

• Bo Dai (Georgia Tech)

• Luckeciano Melo (Microsoft)

• Jordi Orbay (Google)

• Aleksandra Faust (Google)

• Kuang-Huei Lee (Google)

• Yingjie Miao (Google)

• Kimin Lee (Google)

• Austin Huang (Google)

• Kiarash Rahmani (UT Austin)

• Kanishk Gandhi (Stanford)

• Kaixin Ma (CMU)

• Ruijie Zheng (Maryland)

• Chuning Zhu (UW)

• Philip Ball (Oxford)

• Murtaza Dalal (CMU)

• Longtao Zheng (NTU)

• Yongyuan Liang (University of Maryland)

• Sanjay Chawla (HBKU)

• Shuyan Zhou (CMU)

• Suyoung Lee (KAIST)

• Mingtong Zhang (UIUC)

• Raphael Schumann (Heidelberg University)

• Yanjie Ze (SJTU)

• Ruohan Zhang (Stanford)

• Pranav Atreya (UT Austin)

• Kaya Stechly

• Xinyu Zhou (Wuhan University of Technology)

• Braham Snyder (UT Austin)

• Ziyu Wan (SJTU)

• Yunfan Jiang (Stanford)

• Yingchen Xu (UCL)

• Sowmen Das (University of Cambridge)

• Brian Ichter (Google)

• Top Piriyakulkij (Cornell)

• Penglin Cai (PKU)

• Haoqi Yuan (PKU)

• Matteo Pirotta (Meta)

• Hangyu Mao (PKU)

• Yuchen Cui (Stanford)

• Yaru Niu (CMU)

• Ali Baheri (RIT)

• Ge Gao (Cornell)

• Keerthana Gopalakrishnan (Google)

• Yu Bai (PKU)

• Yixuan Wang (UIUC)

• Mehrdad Yazdani (UCSB)

• Byoungjip Kim (LG AI Research)

• Ademi Adeniji (UC Berkeley)

• Kolby Nottingham (NYU)

• Michael Zhang (U Toronto)

• Matthew Marquez (ASU)

• Shuhuai Ren (PKU)

• Changhee Joo (Korea University)

• Derek Guo (UC Berkeley)

• Ziniu Hu (UCLA)

• Jiageng Mao (USC)

• Kanika Madan (MILA)

• Ekdeep Singh Lubana (Microsoft)