Language Models meet World Models (and Agent Models)

8:30am - 12:30pm Pacific Time, Tuesday February 20, 2024

@ AAAI 2024, Room 111

Vancouver, Canada

https://aaai.org/aaai-conference/aaai-24-tutorial-and-lab-list/#th03

Abstract

Large language models (LMs) have achieved remarkable success in many language tasks. Recent works have also shown that knowledge of the world can emerge from large LMs, enabling them to assist decision-making for embodied tasks. However, the world knowledge exhibited by current large LMs is often not robust and cannot be grounded in physical environments without additional models. This hinders their ability to perform complex reasoning and planning tasks reliably. For example, in creating action plans to move blocks to a target state, GPT-4 achieves a significantly lower success rate compared to humans.


On the other hand, humans perform deliberate reasoning and planning based on the mental model of the world, also known as a world model (WM), which enables us to simulate actions and their effects on the world's state. WMs encoding knowledge of the physical world can drastically improve the data efficiency and robustness of intelligent agents.


However, WMs were typically studied in reinforcement learning and robotics, areas conceptually distinct from problems studied in language modeling. This gap indicates new opportunities for connecting WMs and LMs to enhance LM capabilities in reasoning and planning in both embodied and general settings, and address the aforementioned limitations. Emerging studies on the intersection of WMs and LMs have demonstrated promising results. This tutorial aims to summarize and present a unified view of connecting WMs and LMs, highlighting the various opportunities for improved machine reasoning and planning based on large LMs through world modeling. We will review recent works on learning WMs and using them to further learn and perform embodied tasks. We will show how LMs can utilize external WMs to compensate for their lack of grounded world knowledge and how LMs themselves can learn world models from embodied experiences beyond text data, and use these internal WMs to guide complex reasoning.

Materials

[Videos]: The NeurIPS-2023 version of the tutorial has the video recordings [here]

Schedule





Presenters

Zhiting Hu

Assistant Professor

UC San Diego

Tianmin Shu

Assistant Professor

John Hopkins Univ.

References