We propose a safety layer placed after a robot foundation policy that ensures safe state transitions by enforcing action constraints. We ensure formal safety guarantees for generalist policies without providing extensive demonstrations of safe behavior or requiring any specific fine-tuning for safety.
We make any RFM inherently safe on an architectural level by combining it with the ATACOM safety layer. The key idea of ATACOM is to generate a safe action space where we can sample arbitrary actions, while ensuring the satisfaction of the safety constraints. This satisfaction is achieved by constructing the constraint manifold, computing the tangent space at the current robot configuration, and using this tangent space as a safe action space. By taking actions on the tangent space of the constraint manifold, we generate paths moving on the manifold, corresponding to safe trajectories of the robot.
Using the ATACOM framework, we can impose a wide variety of constraints. Several core safety constraints can be defined using only the robot’s kinematic model to ensure a general notion of safety. Joint limit constraints prevent the robot from exceeding its mechanical bounds while still allowing control near those limits. Workspace constraints allow the positioning of cameras and other essential components within the scene. However, the manual definition of constraints for the RFM is time-consuming and often requires expert knowledge and environmental information which is not readily available. In the following, we provide an intuitive, cost-effective, and lightweight approach to semi-automatic constraint generation for obstacles in the visual scene.
We leverage instance segmentation in 2D using SAM2 and lift obtained multi-view segmentation masks into 3D using the pinhole camera equation and the camera intrinsics. Based on the 3D instance segmentation, we calculate minimum bounding boxes for each obstacle in the scene (Middle Figure). To obtain distance estimates between the robot and the bounding boxes, we spawn spheres at key robot positions that cover the manipulator’s hull (Left Figure). To ensure safety, ATACOM guarantees that the distance between each sphere’s hull and the obstacle’s bounding box remains positive (Right Figure).
Spheres cover the robot’s hull at critical areas.
Bounding boxes of obstacles are generated from 2D instance segmentation and depth information.
The distance between the spheres and the obstacle’s bounding box is calculated by projecting the sphere’s center into the bounding box’s coordinate frame and estimating the distance to the bounding box’s hull.
This task requires picking a specified (plastic) fruit off the table and placing it in the box. The potential safety risk is to collide with the table surface, especially with smaller fruits like the strawberry.
This task requires picking up a tennis ball off the table and placing it in the box. The ball is placed close to a cardboard box, which serves as an obstacle that the robot needs to avoid during the pickup. The robot needs to avoid collisions with the obstacle as well as the table simultaneously.
This task requires the robot to pick up a specified object off the table and place it in the box. We add up to 3 obstacles into the workspace that the robot needs to avoid during operation.
We perform our AirHockey experiments in the MuJoCo simulator with a fine-tuned OCTO model. We fine-tune OCTO to predict Cartesian velocity action chunks in the 2D Cartesian space of the mallet end-effector on the AirHockey table. The model gets evaluated on the task of hitting a randomly initialized puck into the goal on the other side of the AirHockey table.
We evaluate the OCTO policy w/o the safety layer on the air hockey hitting task for different checkpoints during the training phase. We report the maximum constraint violation and the success rate of the robot hitting the puck into the goal evaluated on 500 episodes in simulation. When the ATACOM safety layer is added, the policy remains compliant with safety constraints throughout fine-tuning, whereas the unmodified OCTO policy continues to breach safety limits. Both policies progressively improve their success rates over the number of fine-tuning steps.
The fine-tuned octo model hits into the AirHockey table and heavily violates the safety constraints. By adding our safety layer, octo is able to safely hit the puck into the goal.
After evaluation in simulation, we deployed octo with our safety layer on our real-world AirHockey setup. As the policy was trained in simulation with additional puck state information, we use the OptiTrack system to track the puck’s position and velocity. Using proprioception and puck state information, we reconstruct the real-world state in MuJoCo to obtain the simulated visual input observation.