Language Models meet World Models
9:45am - 12:15pm CST, Monday December 11, 2023
@ NeurIPS 2023
https://neurips.cc/virtual/2023/tutorial/73952
Updates: Video recording was released on the NeurIPS website: https://nips.cc/virtual/2023/tutorial/73952
Abstract
Large language models (LMs) have achieved remarkable success in many language tasks. Recent works have also shown that knowledge of the world can emerge from large LMs, enabling them to assist decision-making for embodied tasks. However, the world knowledge exhibited by current large LMs is often not robust and cannot be grounded in physical environments without additional models. This hinders their ability to perform complex reasoning and planning tasks reliably. For example, in creating action plans to move blocks to a target state, GPT-4 achieves a significantly lower success rate compared to humans.
On the other hand, humans perform deliberate reasoning and planning based on the mental model of the world, also known as a world model (WM), which enables us to simulate actions and their effects on the world's state. WMs encoding knowledge of the physical world can drastically improve the data efficiency and robustness of intelligent agents.
However, WMs were typically studied in reinforcement learning and robotics, areas conceptually distinct from problems studied in language modeling. This gap indicates new opportunities for connecting WMs and LMs to enhance LM capabilities in reasoning and planning in both embodied and general settings, and address the aforementioned limitations. Emerging studies on the intersection of WMs and LMs have demonstrated promising results. This tutorial aims to summarize and present a unified view of connecting WMs and LMs, highlighting the various opportunities for improved machine reasoning and planning based on large LMs through world modeling. We will review recent works on learning WMs and using them to further learn and perform embodied tasks. We will show how LMs can utilize external WMs to compensate for their lack of grounded world knowledge and how LMs themselves can learn world models from embodied experiences beyond text data, and use these internal WMs to guide complex reasoning.
Part I: Large Language Models and their limitations
Limitations of LLM reasoning in language reasoning, embodied reasoning, and social reasoning
Part II: Background of World Models and Agent Models
Part III: Reasoning with World and Agent Models, on the Language Model backend
Language model as world model, agent model, goals/reward, planner, belief, ...
Part IV: Panel Discussion
Part V: Enhancing the Language Model Backend
Richer learning paradgims: Learning with embodied experienes, social learning
Multi-modal world modeling
Presenters
Panelists
References
Hu and Shu, "Language Models, Agent Models, and World Models: The LAW for Machine Reasoning and Planning", 2023, https://arxiv.org/abs/2312.05230
(All references can be found in the above perspective/review paper)