SafeVLMs@ICRA25

Safely Leveraging Vision-Language Foundation Models in Robotics :
Challenges and Opportunities

May 23, 2025, Atlanta, Georgia
Workshop @ International Conference on Robotics and Automation (ICRA)

Location: Georgia World Congress Center, Room 411

Workshop Overview

Vision and language foundation models pre-trained on internet scale data have started to revolutionize how robots like mobile manipulators or autonomous cars understand the complex visual world, interpret human feedback, and plan actions. The promise of using these models within robot decision-making is the ability to generalize robot behaviors and push robot deployment into increasingly unstructured or novel environments. At the same time, we must face the fact that robots are safety-critical systems, wherein a foundation model’s single erroneous visual or language interpretation, misaligned behavior generation, or high inference latency can lead to catastrophic consequences. How can we safely leverage vision-language foundation models to expand robot deployment?

This workshop explores frontier safety topics that emerge throughout the life-cycle of foundation models in robotics, ranging from

training (wherein we need to collect and quantify what kinds of embodied data will enable the desired robotics capabilities)
to fine-tuning (wherein we must align these models with human stakeholders)
to deployment (where these models must run in real-time, reliably detect out-of-distribution scenarios, and confidently handover control to fallback strategies).

This workshop aims to:

raise open questions on emerging safety issues when robots relying on vision-language foundation models operate in real-world environments alongside humans
encourage conversation between perception, robotics, efficient AI, and control theory on safely and efficiently adapting foundation models in robotics
provide a forum for discussion among researchers, industry, and regulators as to the core challenges, promising solution strategies, fundamental limitations, and regulatory realities involved in deploying foundation models in robotics applications

Invited Speakers and Panelists

Marco Pavone

Professor, Stanford University

Marco Pavone is an associate professor at Stanford and the Director of Autonomous Vehicle Research at NVIDIA. His main research interests are in the development of methodologies for the analysis, design, and control of autonomous systems, with an emphasis on self-driving cars, autonomous aerospace vehicles, and future mobility systems. He is currently on partial leave from Stanford University, where he is an Associate Professor of Aeronautics and Astronautics. At Stanford, he is also the Director of the Autonomous Systems Laboratory and Co-Director of the Center for Automotive Research at Stanford. He received a Ph.D. degree in Aeronautics and Astronautics from the Massachusetts Institute of Technology in 2010. He is a recipient of a number of awards, including a Presidential Early Career Award for Scientists and Engineers from President Barack Obama, an Office of Naval Research Young Investigator Award, a National Science Foundation Early Career (CAREER) Award, a NASA Early Career Faculty Award, and an Early-Career Spotlight Award from the Robotics Science and Systems Foundation. He was identified by the American Society for Engineering Education (ASEE) as one of America's 20 most highly promising investigators under the age of 40. He is currently serving as an Associate Editor for the IEEE Control Systems Magazine.

Sumeet Singh

Research Scientist, Google DeepMind Robotics

Sumeet is a Senior Research Scientist with Robotics @ Google DeepMind. His core research spans dexterous manipulation, planning and control, and multi-system Vision Language Action (VLA) frameworks, with broader applicability of the algorithmic innovations. More recently, his work has focused on generative modeling, specifically, unifying and improving modeling frameworks such as diffusion and EBMs using tools drawn from game theory and nonlinear control. Some notable awards include the Stanford Graduate Fellowship, Qualcomm Innovation Fellowship, and the IROS RoboCup Best Paper Award.

Subbarao Kambhampati

Professor, ASU

Subbarao Kambhampati is a professor of computer science at Arizona State University. Kambhampati studies fundamental problems in planning and decision making, motivated in particular by the challenges of human-aware AI systems. He is a fellow of Association for the Advancement of Artificial Intelligence, American Association for the Advancement of Science, and Association for Computing machinery. He served as the president of the Association for the Advancement of Artificial Intelligence, a trustee of the International Joint Conference on Artificial Intelligence, the chair of AAAS Section T (Information, Communication and Computation), and a founding board member of Partnership on AI. Kambhampati’s research as well as his views on the progress and societal impacts of AI have been featured in multiple national and international media outlets. He can be followed on Twitter @rao2z.

Hadas Kress-Gazit

Professor, Cornell

Hadas Kress-Gazit is the Geoffrey S.M. Hedrick Sr. Professor at the Sibley School of Mechanical and Aerospace Engineering, and the Associate Dean of Engineering for Diversity and Academic Affairs at Cornell University. She received her Ph.D. in Electrical and Systems Engineering from the University of Pennsylvania in 2008 and has been at Cornell since 2009. Her research focuses on formal methods for robotics and automation and more specifically on synthesis for robotics – automatically creating verifiable robot controllers for complex high-level tasks. Her group explores different types of robotic systems including modular robots, soft robots and swarms and synthesizes ideas from different communities such as robotics, formal methods, control, and hybrid systems. She is an IEEE fellow and has received multiple awards for her research, teaching and advocacy for groups traditionally underrepresented in STEM. She lives in Ithaca with her partner and two kids.

Mingxing Tan

Director, Waymo Reasearch

Mingxing is a Director and Head of Perception Research at Waymo. His work focuses on foundation models for perception and end-to-end driving, coupled with cutting-edge research on LLM/VLM, Diffusion, NeRF/3DGS, and 3D perception. Prior to joining Waymo, he was a research scientist at Google Brain, where he worked on AutoML and developed several popular networks such as EfficientNets.

Mark Riedl

Professor, Georgia Tech

Dr. Mark Riedl is a Professor in the Georgia Tech School of Interactive Computing and Associate Director of the Georgia Tech Machine Learning Center. Dr. Riedl’s research focuses on human-centered artificial intelligence—the development of artificial intelligence and machine learning technologies that understand and interact with human users in more natural ways. Dr. Riedl’s recent work has focused on story understanding and generation, computational creativity, explainable AI, and teaching virtual agents to behave safely. His research is supported by the NSF, DARPA, ONR, the U.S. Army, U.S. Health and Human Services, Disney, Google, Meta, and Amazon. He is the recipient of a DARPA Young Faculty Award and an NSF CAREER Award.

Start-up Spotlight talk

Sagar Manglani

Perception Tech lead, Teleo Inc

Sagar Manglani is the Perception Lead at Teleo Inc. where he is advancing autonomy for heavy machinery in construction by developing vision perception models for off-road environments. Prior to Teleo, he was a Senior Engineer at Ford and has studied artificial intelligence at Stanford University.

Organizers

Ran (Thomas) Tian

UC Berkeley

Andrea Bajcsy
CMU

Rohan Sinha
Stanford University

Andi Peng
MIT / Anthropic

Ed Schmerling
NVIDIA Research

Kratarth Goel
Waymo Research

Acknowledgements

We are tremendously grateful to all the reviewers of the workshop submissions, without whom we could not have released timely decisions. They are, in alphabetical order: Arun L Bishop, Milan Ganai, Hyun Joe Jeong, Chenran Li, Francesco Marchiori, Kensuke Nakamura, Ravi Pandya, Max Peter Ronecker, Saumya Saxena, Junwon Seo, Shuhan Tan, Changhao Wang, Yixiao Wang, Yichen Xie, Yilin Wu, Haochen Zhang, Michelle D Zhao.

Contact

For any questions about the workshop, please email safevlms@gmail.com.

Safely Leveraging Vision-Language Foundation Models in Robotics : Challenges and Opportunities

May 23, 2025, Atlanta, GeorgiaWorkshop @ International Conference on Robotics and Automation (ICRA)