Invited Speakers (In alphabetical order, updated in real time)
Genie 3 is a general-purpose world model that can generate an unprecedented diversity of interactive environments from a single text prompt. This marks a significant advance from static video generation to fully interactive simulations of worlds.
Our model is the first foundation world model that allows real-time interaction at 720p resolution and a consistent 24 frames per second. Genie 3 maintains consistency for minutes of continuous interaction, showing marked improvements in realism and coherence over previous-generation models. Furthermore, Genie 3 introduces “promptable world events”, allowing users to model counterfactuals and alter the state of the world with text prompts on the fly. Genie 3 also demonstrates a rich understanding of the world, capable of modeling complex physical properties such as water and lighting, simulating natural ecosystems, and generating imaginative fictional and animated worlds.
We believe that world models are a key stepping stone along the path to AGI. By making it possible to train AI agents in an unlimited curriculum of rich simulation environments, Genie 3 opens a new frontier for research in embodied AI and general-purpose agent development.
François Chollet is a software engineer and AI researcher, co‑founder of Ndea and the ARC Prize, creator of the Keras deep‑learning library and ARC‑AGI benchmark, and author of Deep Learning with Python.
Large-scale pretraining has plateaued on true generalization. We need a new framework, and new benchmarks, for evaluating how well AI systems can infer rules, explore unfamiliar environments, and acquire new skills efficiently, as humans do. The ARC-AGI family of benchmarks is designed to measure precisely these emerging capabilities.
Chelsea Finn is an Assistant Professor of Computer Science and Electrical Engineering at Stanford University, where she heads the IRIS Lab. Her group studies how robots and other embodied agents acquire versatile skills through large-scale interaction and data, and she is also a co-founder of Pi, a startup focused on learning-enabled robot intelligence.
Prof. Finn’s efforts to endow robots with language-conditioned goals and rapid adaptation speak directly to LAW 2025’s mission of marrying language models with agent and world models: her methods show how high-level instructions can be grounded in physical control and updated on the fly.
Danijar Hafner is a Research Scientist at Google DeepMind best known for the Dreamer family of reinforcement-learning agents, which learn compact latent-space world models and use “imagination” roll-outs for long-horizon planning.
Dreamer illustrates how explicit learned simulators can make agents far more sample-efficient—an insight central to LAW 2025’s agenda of integrating world models with language-guided reasoning and action.
Keyon Vafa is a postdoctoral fellow at Harvard University and an affiliate with the Laboratory for Information & Decision Systems at MIT. His research focuses on understanding and improving the implicit world models learned by generative models. He studies these questions both in traditional AI domains and in the social sciences. Keyon completed his PhD in computer science from Columbia University, where he was an NSF GRFP Fellow and the recipient of the Morton B. Friedman Memorial Prize for excellence in engineering. He also organized the NeurIPS 2024 Workshop on Behavioral Machine Learning and the ICML 2025 Workshop on Assessing World Models, and serves on the Early Career Board of the Harvard Data Science Review.
Real-world AI systems must be robust across a wide range of conditions. One path to such robustness is genuine understanding — a model having a coherent internal model of the world. But it is unclear how to measure, or even define, understanding. This talk will propose theoretically-grounded definitions and metrics that test for a model's implicit understanding, or its world model. We will focus on two kinds of settings: one that tests implicit world models behaviorally, and another that tests them via their internal representations. In applications ranging from testing whether LLMs learn the rules of games to whether foundation models acquire Newtonian mechanics, we find that models can make highly accurate predictions with incoherent world models. Such incoherence creates fragility when a model attempts related but subtly different tasks. Building generative models that meaningfully capture the underlying logic of their domains would enable robust deployment; these results suggest new ways to assess and improve how close a given model is to that goal.
Ying Nian Wu, UCLA Department of Statistics and Data Science
AI world models typically focus on prediction, treating planning as expensive downstream inference. Hippocampal cognitive maps suggest an alternative: representations whose primary purpose is making planning computationally trivial. I present a framework where place cell populations encode multi-scale transition probabilities through geometric structure. Inner products between neural embeddings directly represent how easily one can reach any location from another, transforming navigation into simple gradient ascent—no search trees, no rollouts needed. A time-scale parameter naturally creates hierarchical representations from fine-grained local precision to coarse-grained global connectivity. Non-negativity constraints induce emergent sparsity without regularization, while efficient recursive composition enables "preplay"—discovering shortcuts before physical exploration. I discuss implications for language models, vision systems, and agent architectures, arguing that planning-ready geometric representations—not just predictive models—are essential for flexible goal-directed behavior in AI systems.
Professor Eric Xing is the President of the Mohamed bin Zayed University of Artificial Intelligence, and a Professor of Computer Science at Carnegie Mellon University. His main research interests are the development of machine learning and statistical methodology, and large-scale distributed computational system and architectures, for solving problems involving automated learning, reasoning, and decision-making in in artificial, biological, and social systems. In recent years, he has been focusing on building large language models, world models, agent models, and foundation models for biology.
Prof. Xing has served on the editorial boards of several leading journals including JASA, AOAS, JMLR; was a recipient of several awards including NSF Career, Sloan, Carnegie Science Award, and best papers in conferences such as ACL, ISMB, NeurIPS, and OSDI; and is a fellow of several societies including AAAI, ACM, ASA, IEEE, and IMS.
World Model, the supposed algorithmic surrogate of the real-world environment which biological agents experience with and act upon, has been an emerging topic in recent years because of the rising needs to develop virtual or robotic agents with artificial (general) intelligence. In this talk, starting from the imagination in the Sci-Fi classic Dune, and drawing inspiration from the concept of "hypothetical thinking" in the psychology literature, I discuss several schools of thoughts of world modeling, and lay the ground of a Physical, Agentic, and Nested (PAN) world model whose primary goal is to simulate all actionable possibilities of the real world for purposeful reasoning and planning via thought-experiment, in order to perform long-term, causal, and coherent actions toward a goal (or goals), rather than optimizing short-horizon visual metrics such as frame level fidelity or motion realism. We propose a Generative Latent Prediction (GLP) architecture that builds on stateful latent space, long-term and close-loop action-conditioned latent reasoning, inference grounding over realizable world states, and training through both SSL and RL. And we present PAN, built on the GLP architecture that brings together perception, state, action, and causality within one model to supports open-domain interactable world simulation. Extensive experiments show that PAN achieves strong performance in action-conditioned world simulation, long-horizon forecasting, and simulative reasoning compared to other video generators and world models.
Sherry Yang is a Staff Research Scientist at Google DeepMind and will soon join New York University as an Assistant Professor. Her recent work focuses on scaling and aligning large language models, multimodal reasoning, and evaluating multi-agent interactions.
Evaluating robot control policies is difficult: real-world testing is costly, and handcrafted simulators require manual effort to improve in realism and generality. We propose a world-model-based policy evaluation environment (WorldGym), an autoregressive, action-conditioned video generation model which serves as a proxy to real world environments. Policies are evaluated via Monte Carlo rollouts in the world model, with a vision-language model providing rewards. We evaluate a set of VLA-based real-robot policies in the world model using only initial frames from real robots, and show that policy success rates within the world model highly correlate with real-world success rates. Moreoever, we show that WorldGym is able to preserve relative policy rankings across different policy versions, sizes, and training checkpoints. Due to requiring only a single start frame as input, the world model further enables efficient evaluation of robot policies' generalization ability on novel tasks and environments. We find that modern VLA-based robot policies still struggle to distinguish object shapes and can become distracted by adversarial facades of objects. While generating highly realistic object interaction remains challenging, WorldGym faithfully emulates robot motions and offers a practical starting point for safe and reproducible policy evaluation before deployment.
Contact: law2025@googlegroups.com