We have witnessed a rapid advance in leveraging Large Language Models (LLMs), Vision-Language Models (VLMs), Vision-Language-Action Models (VLAs), and Diffusion Models for robot task and motion planning in recent years. These foundation models are revolutionizing robotics by fusing visual interpretation with natural language understanding, enabling more intuitive, adaptable, and generalizable interactions with both the environment and human users.
Traditional symbolic and neuro-symbolic approaches have long been fundamental to robot planning, providing strong guarantees, interpretability, and structured reasoning. Meanwhile, foundation models introduce data-driven flexibility, enabling few-shot learning, open-ended reasoning, and generalization across diverse tasks. However, integrating these paradigms remains an open challenge: How can symbolic reasoning and foundation models complement each other? What are the trade-offs in robustness, efficiency, and adaptability?
This workshop will bring together researchers and practitioners from the fields of robotics, planning, machine learning, formal methods, foundation models, and human-robot interaction to explore the evolving role of foundation models in robot planning, examining their strengths, limitations, and synergies with established planning techniques. Through invited talks, technical paper presentations, interactive sessions, and panel discussions, we hope to assimilate recent advances from diverse perspectives and lead in-depth discussions toward developing a shared vision of the key open problems in the area.
Task vs. Motion Planning
How do we define robot planning, e.g., AI planning, TAMP, POMDP planning, motion planning, and value iteration?
How should task and motion planning be coupled or decoupled?
How can we bridge different planning paradigms?
Where should the boundary be drawn, and how does it affect symbolic and foundation model-based approaches?
Foundation Models in Planning
Where do foundation models provide the most value, e.g., LLM as a planner, LLM with a planner, intermediaries, or symbolic/code-based generators?
What advancements in foundation models can help robot planning?
How can we integrate structured reasoning with foundation models?
Tradeoffs of Using Foundation Models
How do foundation models compare with traditional symbolic, neuro-symbolic planners in scalability, efficiency, interpretability, and adaptability?
What are their key advantages and limitations?
Data Bottlenecks for Robot Planning Models
Can large-scale planning datasets be built with minimal human intervention?
Can self-supervised or synthetic data approaches bridge the gap?
Training Frameworks for Robot Planning Models
What are the key considerations in designing training pipelines?
How should latency and hardware constraints shape architectures and strategies?
Scaling VLAs to Complex Scenarios
How can VLAs handle long-horizon planning and multi-robot coordination?
What adaptations enable VLAs to generalize to diverse, dynamic environments?
Challenges in Real-World Deployment
Ensuring generalization, safety, and efficiency in dynamic environments.
Managing perception errors, sensor noise, and computational costs.
Stanford University, USA
Arizona State University, USA
Physical Intelligence, USA
University of Toronto, Canada
Stanford University, Physical Intelligence, USA
Stanford University, Physical Intelligence, USA
Tsinghua University, China
Columbia University, USA
Princeton University, USA
George Mason University, USA
MIT, USA
Harvard University and MIT, USA
Harvard University and MIT, USA
Brown University, USA
Northeastern University, USA
MIT, USA
Harvard, USA
Na Li
Harvard University, USA
Lawson Wong
Northeastern University, USA
Brown University, USA
Chuchu Fan
MIT, USA
Every paper received at least two reviews. We thank the help from all our reviewers!
Yongchao Chen
Jason Xinyu Liu
Linfeng Zhao