Robot Planning in the Era of Foundation Models (FM4RoboPlan)

21st Robotics: Science and Systems (RSS) Conference

Talk video record: https://www.youtube.com/@FM4RoboPlan

Los Angeles, USA, USC Room SGM 123

June 21st, 2025 (8:30am-12:35pm)

We have witnessed a rapid advance in leveraging Large Language Models (LLMs), Vision-Language Models (VLMs), Vision-Language-Action Models (VLAs), and Diffusion Models for robot task and motion planning in recent years. These foundation models are revolutionizing robotics by fusing visual interpretation with natural language understanding, enabling more intuitive, adaptable, and generalizable interactions with both the environment and human users.

Traditional symbolic and neuro-symbolic approaches have long been fundamental to robot planning, providing strong guarantees, interpretability, and structured reasoning. Meanwhile, foundation models introduce data-driven flexibility, enabling few-shot learning, open-ended reasoning, and generalization across diverse tasks. However, integrating these paradigms remains an open challenge: How can symbolic reasoning and foundation models complement each other? What are the trade-offs in robustness, efficiency, and adaptability?

This workshop will bring together researchers and practitioners from the fields of robotics, planning, machine learning, formal methods, foundation models, and human-robot interaction to explore the evolving role of foundation models in robot planning, examining their strengths, limitations, and synergies with established planning techniques. Through invited talks, technical paper presentations, interactive sessions, and panel discussions, we hope to assimilate recent advances from diverse perspectives and lead in-depth discussions toward developing a shared vision of the key open problems in the area.

Discussion topics:

Task vs. Motion Planning

How do we define robot planning, e.g., AI planning, TAMP, POMDP planning, motion planning, and value iteration?
How should task and motion planning be coupled or decoupled?
How can we bridge different planning paradigms?
Where should the boundary be drawn, and how does it affect symbolic and foundation model-based approaches?

Foundation Models in Planning

Where do foundation models provide the most value, e.g., LLM as a planner, LLM with a planner, intermediaries, or symbolic/code-based generators?
What advancements in foundation models can help robot planning?
How can we integrate structured reasoning with foundation models?

Tradeoffs of Using Foundation Models

How do foundation models compare with traditional symbolic, neuro-symbolic planners in scalability, efficiency, interpretability, and adaptability?
What are their key advantages and limitations?

Data Bottlenecks for Robot Planning Models