Vision-language-action models (VLAs) show potential as generalist robot policies. However, these models pose extreme safety challenges during real-world deployment, including the risk of harm to the environment, the robot itself, and humans. How can safety constraints be explicitly integrated into VLAs? We address this by exploring an integrated safety approach (ISA), systematically modeling safety requirements, then actively eliciting diverse unsafe behaviors, effectively constraining VLA policies via safe reinforcement learning, and rigorously assuring their safety through targeted evaluations. Leveraging the constrained Markov decision process (CMDP) paradigm, ISA optimizes VLAs from a min-max perspective against elicited safety risks. Thus, policies aligned through this comprehensive approach achieve the following key features: (I) effective \textbf{safety-performance trade-offs}, this exploration yields an 83.58\% safety improvement compared to the current state-of-the-art method, while also maintaining task performance (+3.85\%). (II) strong \textbf{safety assurance}, with the ability to mitigate long-tail risks and handle extreme failure scenarios. (III) robust \textbf{generalization} of learned safety behaviors to various out-of-distribution perturbations.
During routine navigation, the robot operates safely and stably, while demonstrating calmness and flexibility in more complex environments. During long-distance exploration, it maintains caution around all objects in the environment, ensuring a safe distance. When entering narrow corners, the robot slows down, maintains a safe motion space, and adjusts its direction promptly.
Robots can flexibly explore narrow spaces while maintaining a safe distance from the surrounding environment.
The robot is capable of anticipating small environmental objects in advance, avoiding stuttering or collisions when passing by them.
The robot has achieved greater stability, slowing down when observing the environment and not rapidly approaching potential target objects.
The robot's environmental perception capability has improved, allowing it to approach and grasp objects in a safe manner.
SafeVLA maintains safety and task performance in the presence of OOD perturbations.
The target was correctly identified; however, during the execution, environmental objects (e.g., the coffee machine) were ignored. While attempting to grasp the target (the mug), repeated and significant collisions with environmental objects occurred. Although the task was successfully completed, the environment suffered substantial damage, indicating a very low level of safety in task execution.
The incorrect target (gas cylinder) was mistaken for the actual target (vase), leading the robot to repeatedly interact with the wrong target, including actions such as shaking and significant collisions. As a result, not only was the task not completed, but the environment was also placed in a state of high danger.
During the execution of the task, the robot interacted with hazardous items (e.g., a knife). While exploring the path, it also made contact with non-essential environmental objects (e.g., a chair), causing the knife on the table to fall.
Though it identifies the bed correctly, the robot becomes stuck on a door handle and fails to recognize its hazard, continuing to collide for over 20 seconds.
When the target is lost, the robot searches repeatedly within a small area, increasing the risk of unsafe interactions.
The robot incorrectly identifies the target (a cabinet instead of a bed) and, lacking safety reinforcement learning, follows the shortest path, ignoring obstacles like doorframes, especially when they are outside the camera's view.
Our experiments demonstrate that SafeVLA successfully decouples the two optimization objectives of safety and task performance, optimizing safety as an independent dimension. This results in the highest task performance and the lowest cumulative cost across all tasks, outperforming existing SOTA methods in both safety and task performance, with improvements of 83.58% and 3.85%, respectively. Furthermore, the improvements in safety and task performance achieved through SafeVLA alignment remain stable even in highly OOD scenarios.
This table provides a quantitative evaluation of model performance and safety for each method.
This table shows the impact of OOD scenarios on the performance and safety gains from alignment.
This figure illustrates the distribution changes in cumulative cost of robot trajectories after SafeVLA alignment.