SafeVLMs@ICRA25

Program

9:00 am - 9:15 am

Welcome and opening remarks

09:15 am - 09:45 am

Keynote by Marco Pavone: Ensuring Physical AI Safety in AI-enabled Autonomous Systems

09:45 am - 10:00 am

Startup Spotlight - Teleo Inc (Sagar Manglani) : Secure and Scalable Auto-Annotation Using Local Vision-Language Models in Robotics

10:00 am - 10:30 am

Spotlight Session I

Interleave-VLA: Enhancing Robot Manipulation with Interleaved Image-Text Instructions
Versatile Legged Locomotion Adaptation through Vision-Language Grounding
Towards Safe Robot Foundation Models Using Inductive Biases
Semantically Safe Robot Manipulation: From Semantic Scene Understanding to Motion Safeguards
Human-in-the-loop Foundation Model Failure Recovery for Robot-Assisted Bite Acquisition
CREStE: Scalable Mapless Navigation with Internet Scale Priors and Counterfactual Guidance

10:30 am - 11:00 am

Poster Session I & Coffee Break

11:00 am - 11:30 am

Oral Session I - Visual and Textual Robustness of Vision-Language-Action Models

Oral Talk 1 - Run-time Observation Interventions Make Vision-Language-Action Models More Visually Robust
Oral Talk 2 - Let's Talk About Language! Investigating Linguistic Diversity in Embodied AI Datasets

11:30 am - 12:00 pm

Keynote by Subbarao Kambhampati: Style vs. Correctness Considerations in Using LLMs/VLMs/LRMs for Task Planning

12:00 pm - 01:00 pm

Lunch break

01:00 pm - 01:30 pm

Keynote by Mingxing Tan: Waymo Research: VLMs for E2E Autonomous Driving

01:30 pm - 02:00 pm

Keynote by Mark Riedl: Aligning Agents and Users with Explainable AI

02:00 pm - 03:00 pm

Spotlight Session II

DynaMem: Online Dynamic Spatio-Semantic Memory for Open World Mobile Manipulation
Adaptive Energy Regularization for Autonomous Gait Transition & Energy-Efficient Quadruped Locomotion
Residual Policy Gradient: A Reward View of KL-regularized Objective
OpenLex3D: A New Evaluation Benchmark for Open-Vocabulary 3D Scene Representations
Probing a Vision-Language-Action Model for Symbolic States and Integration into a Cognitive Architecture
Foundation Model Embedding-Based Semantic Anomaly Detection
KitchenVLA: Iterative Vision-Language Corrections for Robotic Execution of Human Tasks
MAGIC-VFM Meta-learning Adaptation for Ground Interaction Control with Visual Foundation Models

03:00 pm - 03:30 pm

Poster Session II and Coffee Break

03:30 pm - 04:00 pm

Oral Session II - Alignment and Generalization

Oral Talk 3 - Adapting Diffusion Policies to Human Preferences via Reward-Guided Fine-Tuning
Oral Talk 4 - Robot Utility Models: General Policies for Zero-Shot Deployment in New Environments

04:00 pm - 04:30 pm

Keynote by Hadas Kress-Gazit: Evaluating Foundation Models is hard. Do it anyway!

04:30 pm - 05:00 pm

Keynote by Sumeet Singh: A Layered Approach for Safe Vision Language Action Models

05:00 pm - 05:15 pm

Concluding remarks

Page updated

Report abuse