Enhanced Stereo Perception of Transparent Objects for Robotic Manipulation

Kaixin Bai, Huajian Zeng, Lei Zhang*, Yiwen Liu, Hongli Xu,

Zhaopeng Chen, Jianwei Zhang

* corresponding author: lei.zhang-1@studium.uni-hamburg.de

📃arXiv, 📃PDF, 📃Supplementary material, 📦Datasets (UHH Link, OneDrive Link)

Abstract

Transparent object depth perception poses a challenge in everyday life and logistics, primarily due to the inability of standard 3D sensors to accurately capture depth on transparent or reflective surfaces. This limitation significantly affects depth map and point cloud-reliant applications, especially in robotic manipulation. We developed a vision transformer-based algorithm for stereo depth recovery of transparent objects. This approach is complemented by an innovative feature post-fusion module, which enhances the accuracy of depth recovery by structural features in images. To address the high costs associated with dataset collection for stereo camera-based perception of transparent objects, our method incorporates a parameter-aligned, domain-adaptive, and physically realistic Sim2Real simulation for efficient data generation, accelerated by AI algorithm. Our experimental results demonstrate the model's exceptional Sim2Real generalizability in real-world scenarios, enabling precise depth mapping of transparent objects to assist in robotic manipulation.

Supplementary Video

Model Attribution

The cosmetic product models were purchased from Sketchfab. Transparent container models were custom-designed in-house. Scene models and corresponding materials were sourced from free assets available via BlenderKit.

In case this work has provided some assistance in your research endeavors, your citation would be greatly appreciated.

@article{bai2024cleardepth,

title={ClearDepth: Enhanced Stereo Perception of Transparent Objects for Robotic Manipulation},

author={Bai, Kaixin and Zeng, Huajian and Zhang, Lei and Liu, Yiwen and Xu, Hongli and Chen, Zhaopeng and Zhang, Jianwei},

journal={arXiv preprint arXiv:2409.08926},

year={2024}

}

@article{bai2025stereoanything,

title={StereoAnything: Advanced Zero-Shot Stereo Imaging for Multi-Finger Grasp Detection with Transparent Objects},

author={Bai, Kaixin and Zhang, Lei and Liu, Yiwen and Chen, Zhaopeng and Zhang, Jianwei},

journal={Authorea Preprints},

year={2025},

doi={10.36227/techrxiv.174612328.83478240/v1},

url={https://doi.org/10.36227/techrxiv.174612328.83478240/v1},

publisher={Authorea}

}

Page updated

Google Sites

Report abuse