Abstract
Real-world robotic systems frequently require diverse end-effectors for different tasks, however most existing grasp detection methods are optimized for a single gripper type, demanding retraining or optimization for each novel gripper configuration. This gripper-specific retraining paradigm is neither scalable nor practical. We propose XGrasp, a real-time gripper-aware grasp detection framework that generalizes to novel gripper configurations without additional training or optimization. To resolve data scarcity, we augment existing single-gripper datasets with multi-gripper annotations by incorporating the physical characteristics and closing trajectories of diverse grippers. Each gripper is represented as a two-channel 2D image encoding its static shape (Gripper Mask) and dynamic closing trajectory (Gripper Path). XGrasp employs a hierarchical two-stage architecture consisting of a Grasp Point Predictor (GPP) and an Angle-Width Predictor (AWP). In the AWP, contrastive learning with a quality-aware anchor builds a gripper-agnostic embedding space, enabling generalization to novel grippers without additional training. Experimental results demonstrate that XGrasp outperforms existing gripper-aware methods in both grasp success rate and inference speed across diverse gripper types.
Method
Overview of the XGrasp framework: (a) Grasp Point Predictor (GPP) localizes the optimal grasp point from the full scene image and gripper input, (b) Angle Width Predictor (AWP) determines the grasp angle and width from a cropped scene patch, and (c) AWP training with triplet loss and a quality-aware anchor to build an embedding space that generalizes across gripper types.
Experiments
Jacquard Dataset
Simulation
Real-World
Ablation Study
(Dataset)
Ablation Study
(Input Gripper Features)
Ablation Study
(Loss Functions)