Fast Uncertainty Quantification for Deep Object Pose Estimation


[Paper Link][Code]

Guanya Shi, Yifeng Zhu, Jonathan Tremblay, Stan Birchfield,

Fabio Ramos, Animashree Anandkumar, Yuke Zhu

Technical Summary Video

Motivation

  • Deep object pose estimators are often unreliable and overconfident especially when the input image is outside the training domain, for instance, with sim2real transfer.

  • The following two images show the pose estimations of the Ketchup object from a SOTA pose estimator. Both results are very confident, but the right one is wrong!

  • For many robotics tasks, we need to efficiently and robustly quantify the uncertainty of pose estimations from deep learning-based object pose estimators.

  • Prior uncertainty quantification methods for the object pose estimation task require heavy modifications of the training process or the model inputs.

  • We develop a simple, efficient, and plug-and-play uncertainty quantification method for the 6-DoF object pose estimation task, using an ensemble of K pre-trained estimators with different architectures and/or training data sources.

Proposed Method

  • We first train K deep object pose estimators with different architectures and training data source. For example, here we present three models with two different architectures and trained from two different synthetic data sources.

  • Then we input an image to get K pose predictions, to calculate their average disagreement based on a metric function f for uncertainty quantification.

  • We study four types of the disagreement metric f:

    1. Translational

    2. Rotational

    3. Average distance (ADD)

    4. Learned (note that the learned metric requires labeled data on the target domain)

  • Two examples (bigger disagreement -> more uncertainty, smaller disagreement -> less uncertainty):

Quantitative Results

  • We first study the correlation between the proposed uncertainty quantification and the true pose estimation errors, using the Spearman's rank correlation coefficient. This analysis is on the real-world HOPE dataset.

    • We find our method yields much stronger correlations than baselines, and ADD is the best learning-free disagreement metric, which is only slightly worse than the learned metric.

  • (Application I) We apply the proposed uncertainty quantification method for a camera perspective selection task. The data is generated by a ray-tracing based renderer, ViSII. We find our method significantly reduces the pose estimation errors of the selected frames.

  • (Application II) We then examine the utility of our uncertainty quantification method for an uncertainty-guided robotic grasping task. We use our method to choose the optimal point of view for a real robotic arm. Our methods increase the grasping success rate from 35% to 90%. See two videos below for examples.

BBQ_withsubtitle.mp4
Ketchup_withsubtitle.mp4