Language Models meet World Models (and Agent Models)
8:30am - 12:30pm Pacific Time, Tuesday February 20, 2024
@ AAAI 2024, Room 111
Vancouver, Canada
https://aaai.org/aaai-conference/aaai-24-tutorial-and-lab-list/#th03
Abstract
Large language models (LMs) have achieved remarkable success in many language tasks. Recent works have also shown that knowledge of the world can emerge from large LMs, enabling them to assist decision-making for embodied tasks. However, the world knowledge exhibited by current large LMs is often not robust and cannot be grounded in physical environments without additional models. This hinders their ability to perform complex reasoning and planning tasks reliably. For example, in creating action plans to move blocks to a target state, GPT-4 achieves a significantly lower success rate compared to humans.
On the other hand, humans perform deliberate reasoning and planning based on the mental model of the world, also known as a world model (WM), which enables us to simulate actions and their effects on the world's state. WMs encoding knowledge of the physical world can drastically improve the data efficiency and robustness of intelligent agents.
However, WMs were typically studied in reinforcement learning and robotics, areas conceptually distinct from problems studied in language modeling. This gap indicates new opportunities for connecting WMs and LMs to enhance LM capabilities in reasoning and planning in both embodied and general settings, and address the aforementioned limitations. Emerging studies on the intersection of WMs and LMs have demonstrated promising results. This tutorial aims to summarize and present a unified view of connecting WMs and LMs, highlighting the various opportunities for improved machine reasoning and planning based on large LMs through world modeling. We will review recent works on learning WMs and using them to further learn and perform embodied tasks. We will show how LMs can utilize external WMs to compensate for their lack of grounded world knowledge and how LMs themselves can learn world models from embodied experiences beyond text data, and use these internal WMs to guide complex reasoning.
Materials
Schedule
Part I: Large Language Models and their limitations (30mins)
Limitations of LLM reasoning in language reasoning, embodied reasoning, and social reasoning
Part II: Background of World Models and Agent Models (30mins)
Part III: Reasoning with World and Agent Models, on the Language Model backend (60mins)
Language model as world model, agent model, goals/reward, planner, belief, ...
Break: 10:30am - 11:00am
Part IV: Enhancing the Backend beyond Language Models (75mins)
Richer learning paradgims: Learning with embodied experienes, social learning
Multi-modal world modeling
Latent-space reasoning
Future directions
Presenters
References
Hu and Shu, "Language Models, Agent Models, and World Models: The LAW for Machine Reasoning and Planning", 2023, https://arxiv.org/abs/2312.05230
(All references can be found in the above perspective/review paper)