Location: Georgia World Congress Center, Room 411
9:00 am - 9:15 am
Welcome and opening remarks
09:15 am - 09:45 am
Keynote by Marco Pavone: Ensuring Physical AI Safety in AI-enabled Autonomous Systems
09:45 am - 10:00 am
Startup Spotlight - Teleo Inc (Sagar Manglani) : Secure and Scalable Auto-Annotation Using Local Vision-Language Models in Robotics
10:00 am - 10:30 am
Spotlight Session I
Interleave-VLA: Enhancing Robot Manipulation with Interleaved Image-Text Instructions
Versatile Legged Locomotion Adaptation through Vision-Language Grounding
Towards Safe Robot Foundation Models Using Inductive Biases
Semantically Safe Robot Manipulation: From Semantic Scene Understanding to Motion Safeguards
Human-in-the-loop Foundation Model Failure Recovery for Robot-Assisted Bite Acquisition
CREStE: Scalable Mapless Navigation with Internet Scale Priors and Counterfactual Guidance
10:30 am - 11:00 am
Poster Session I & Coffee Break
11:00 am - 11:30 am
Oral Session I - Visual and Textual Robustness of Vision-Language-Action Models
Oral Talk 1 - Run-time Observation Interventions Make Vision-Language-Action Models More Visually Robust
Oral Talk 2 - Let's Talk About Language! Investigating Linguistic Diversity in Embodied AI Datasets
11:30 am - 12:00 pm
Keynote by Subbarao Kambhampati: Style vs. Correctness Considerations in Using LLMs/VLMs/LRMs for Task Planning
12:00 pm - 01:00 pm
Lunch break
01:00 pm - 01:30 pm
Keynote by Mingxing Tan: Waymo Research: VLMs for E2E Autonomous Driving
01:30 pm - 02:00 pm
Keynote by Mark Riedl: Aligning Agents and Users with Explainable AI
02:00 pm - 03:00 pm
Spotlight Session II
DynaMem: Online Dynamic Spatio-Semantic Memory for Open World Mobile Manipulation
Adaptive Energy Regularization for Autonomous Gait Transition & Energy-Efficient Quadruped Locomotion
Residual Policy Gradient: A Reward View of KL-regularized Objective
OpenLex3D: A New Evaluation Benchmark for Open-Vocabulary 3D Scene Representations
Probing a Vision-Language-Action Model for Symbolic States and Integration into a Cognitive Architecture
Foundation Model Embedding-Based Semantic Anomaly Detection
KitchenVLA: Iterative Vision-Language Corrections for Robotic Execution of Human Tasks
MAGIC-VFM Meta-learning Adaptation for Ground Interaction Control with Visual Foundation Models
03:00 pm - 03:30 pm
Poster Session II and Coffee Break
03:30 pm - 04:00 pm
Oral Session II - Alignment and Generalization
Oral Talk 3 - Adapting Diffusion Policies to Human Preferences via Reward-Guided Fine-Tuning
Oral Talk 4 - Robot Utility Models: General Policies for Zero-Shot Deployment in New Environments
04:00 pm - 04:30 pm
Keynote by Hadas Kress-Gazit: Evaluating Foundation Models is hard. Do it anyway!
04:30 pm - 05:00 pm
Keynote by Sumeet Singh: A Layered Approach for Safe Vision Language Action Models
05:00 pm - 05:15 pm
Concluding remarks