KeyPose: Multi-View 3D Labeling and Keypoint Estimation for Transparent Objects

Xingyu Liu, Rico Jonschkowski, Anelia Angelova, Kurt Konolige


Abstract

Estimating the 3D pose of desktop objects is crucial for applications such as robotic manipulation. Many exiting methods use a depth map of the object for both training and prediction; the depth map comes from an RGBD sensor. Thus these methods restrict themselves to opaque, lambertian objects that give good returns from the depth sensor. In this paper we forgo using a depth sensor in favor of raw stereo input. We address two problems: first, establish an easy method for capturing and labeling 3D keypoints on desktop objects with a stereo sensor; and second, develop a method, called \emph{KeyPose}, that learns to accurately predict pose using 3D keypoints on objects, including challenging ones such as transparent objects. To showcase the performance of the method, we create and employ a dataset of 15 clear objects in 5 classes, with 48k 3D-keypoint labeled images. Given a loose bounding box of the object, we train both instance and category models, and show generalization to new textures, poses, and objects. KeyPose surpasses state-of-the-art performance in 3D pose estimation on this dataset, by factors of 1.5 to 3.5, even in cases where the competing method is provided with ground-truth depth. And in comparison with monocular input, stereo improves accuracy by a factor of 2. We will release a public version of the data capture and labeling pipeline, the transparent object database, and the KeyPose models and evaluation code.

Links

CVPR 2020 Conference Paper: arXiv

Contact: "xingyul3 {at} cs {dot} cmu {dot} edu" for more information


Dataset

License

This dataset is made available under the Creative Commons 4.0 License.

Object Mesh Model Download

The object 3D CAD mesh models and 3D keypoint definitions can be downloaded here. The unit of length for the 3D CAD meshes and 3D keypoints is meter.

Training Data Download

The dataset consists of image sequences, with 40 sequences for each object. Each download is about 10 GB. Every sequence has a set of entries that are numbered from 000000 to 000080, as follows:

bottle_0/

texture_0_pose_0/

000000_L.png - Left image (reference image)

000000_L.pbtxt - Left image parameters

000000_R.png - Right image

000000_R.pbtxt - Right image parameters

000000_border.png - Border of the object in the left image (grayscale)

000000_mask.png - Mask of the object in the left image (grayscale)

000000_Dt.exr - Depth image for the transparent object

000000_Do.exr - Depth image for the opaque object

...

The data for each object can be downloaded by clicking on the images below:

Code

Sample code for displaying the data and running pre-built models is available at Github:

svn export https://github.com/google-research/google-research/trunk/keypose

You can also find the github repository here.

Bibtex

@inproceedings{liu2020keypose,

title = {KeyPose: Multi-View 3D Labeling and Keypoint Estimation for Transparent Objects},

author = {Xingyu Liu and Rico Jonschkowski and Anelia Angelova and Kurt Konolige},

booktitle = {2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2020)},

year = {2020},

}