TGF-Net: Sim2Real Transparent Object 6D Pose Estimation Based on Geometric Fusion

Author: Haixin Yu*, Shoujie Li*, Houde Liu, Chongkun Xia, Wenbo Ding, Bin Liang

Transparent objects are a common part of daily life, but their unique optical properties make estimating their 6D pose a challenging task. In this letter, we propose TGF-Net, a monocular instance-level 6D pose estimation method for transparent objects based on geometric fusion. TGF-Net learns the edge features and surface fragments of transparent objects as intermediate features, and reduces the influence of appearance changes on the 6D pose estimation of transparent objects by fusing rich geometric features in the network. Moreover, we propose an approach for generating high-fidelity large-scale synthetic dataset of transparent objects using Blender, and use this approach to generate a synthetic dataset Trans6D-32K. Trans6D-32K contains rendered RGB images and poses information about transparent objects in a variety of different backgrounds, perspectives, and lighting conditions. To evaluate the performance of TGF-Net on 6D pose estimation of transparent objects, we compare with multiple related works on the dataset Trans6D-32K. TGF-Net can be trained entirely on synthetic datasets without fine-tuning and applied directly to real-world scenarios. Multiple challenging real-scene experiments demonstrate the good performance of TGF-Net, while grasping experiments demonstrate the application value of TGF-Net in transparent object manipulation.

Trans6D-32K_Dataset

https://drive.google.com/file/d/1mcJCIxsgXAQk9E7L4eIhnWG2mFarZjd1/view?usp=sharing

The proposed dataset contains ten kinds of objects, all of which are common types of objects in households. In order to include as many types of objects as possible, the ten objects include 5 symmetrical objects and 5 asymmetrical objects.

We select 400 images of each object for training, which is large enough to train an accurate 6D pose estimation network. The backgrounds of the 400 images in the train dataset are generated by extracting one frame per ten frames of a video. At the same time, we extract 5000 backgrounds from another completely different video, and use these 5000 backgrounds to randomly generate a test dataset.

Experiment Evaluation

To verify that the proposed TGF-Net can be directly applied to real-world scenarios, we conduct experiments with real objects. We conducted a total of four experiments to study the effects of changing backgrounds, different internal liquids, and changing lights on the 6D pose estimation of transparent objects. Finally, we used a robotic arm to grasp transparent objects to demonstrate the application value of our method.

video.mp4

Page updated

Google Sites

Report abuse