One of the critical steps in robotic sorting is the accurate perception of objects using RGB-D sensors, which enables the generation of precise grasping poses. Sorting transparent objects with industrial robots is challenging due to their low contrast and sparse depth information. Current transparent object perception methods (depth completion) attempt to learn multi-level feature representations from RGB and sparse depth data. However, they neglect the effective interaction of shallow RGB-D features and fail to preserve high-frequency details in deep features. To tackle these issues, we propose a lightweight Prior-Encoding-Decoding (PED) cascade depth completion framework for reconstructing complete depth data from severely sparse depth maps of transparent objects. Specifically, we design a Cross-modal Fine-grained Channel Attention Module (CFCAM) to dynamically integrate shallow RGB-D features, establishing dependencies between local and global features. Additionally, we apply the Haar Wavelet Transform (HWT) during the encoding phase and implement a Nonlinear Activation Upsampling Module (NAUM) in the decoding phase to effectively retain high-frequency details and enhance the richness and integrity of deep feature representations. PED was trained and evaluated on three mainstream datasets: TransCG, ClearGrasp, and Omniverse Object. Grasping experiments on the UR5 robot platform illustrate that the PED enables a high success rate for grasping transparent objects using solely a low-cost RGB-D camera. The project code and robotic experiment video is available at: https://github.com/meiguiz/PED.
Real-world robot grasping and sorting experiments
Overview of our proposed PED framwork