Single-Stage Keypoint-Based Category-Level Object Pose Estimation from an RGB Image
Yunzhi Lin Jonathan Tremblay Stephen Tyree Patricio A. Vela Stan Birchfield
NVIDIA Georgia Tech
Yunzhi Lin Jonathan Tremblay Stephen Tyree Patricio A. Vela Stan Birchfield
NVIDIA Georgia Tech
Abstract: Prior work on 6-DoF object pose estimation has largely focused on instance-level processing, in which a textured CAD model is available for each object being detected. Category-level 6-DoF pose estimation represents an important step toward developing robotic vision systems that operate in unstructured, real-world scenarios. In this work, we propose a single-stage, keypoint-based approach for category-level object pose estimation that operates on unknown object instances within a known category using a single RGB image as input. The proposed network performs 2D object detection, detects 2D keypoints, estimates 6-DoF pose, and regresses relative bounding cuboid dimensions. These quantities are estimated in a sequential fashion, leveraging the recent idea of convGRU for propagating information from easier tasks to those that are more difficult. We favor simplicity in our design choices: generic cuboid vertex coordinates, single-stage network, and monocular RGB input. We conduct extensive experiments on the challenging Objectron benchmark, outperforming state-of-the-art methods on the 3D IoU metric (27.6% higher than the MobilePose single-stage approach and 7.1 % higher than the related two-stage approach).
Paper: arXiv (published at ICRA 2022)
Code: GitHub
We focus on pose estimation (6-DoF translation and rotation, up to unknown scale) for category-level objects. Each category is equipped with a single network.
Given a monocular RGB image, the network represents each object as a center point, extracts different modalities, and computes 6-DoF pose by perspective-n-point (PnP).
From the robot's view
With multiple objects (including transparency)
@inproceedings{lin2022icra:centerpose,
title={Single-Stage Keypoint-based Category-level Object Pose Estimation from an {RGB} Image},
author={Lin, Yunzhi and Tremblay, Jonathan and Tyree, Stephen and Vela, Patricio A. and Birchfield, Stan},
booktitle={IEEE International Conference on Robotics and Automation (ICRA)},
month = May,
year=2022
}