Task 1: Hand-held Object Pose Estimation



Overview

The goal of this task is to estimate the pose of hand-held objects from a single RGB image. Hand-held object pose estimation has several applications in robotics and augmented reality. Acquiring accurate pose of objects can be a crucial step during handovers in human-robot interaction, or in the metaverse applications that aim to blur the boundaries between the real and the virtual worlds. Many previous methods for object pose estimation have focused on acquiring 6D pose of known-objects (3D object models available) in non-interactive scenarios [1,2]. In interactive scenarios, such as hand-object interactions, the manipulator (hand) provides an important cue/prior about the pose of the manipulated object and can be utilized to improve the accuracy of the estimated object pose [3].

In this challenge, we refactor the HO-3D dataset to create a new train/test split and provide the hand and object 3D poses for the train split, and only the hand poses for the test split. The participants are required to estimate the object pose from the RGB image in the test split and are encouraged to utilize the ground-truth hand poses. The participants are also free to estimate the hand poses themselves. The submissions are evaluated on CodaLab server.

Dataset Details

The train/test split contains 10 objects from the YCB dataset [4] which were originally used in the HO-3D dataset. The train set contains 79,889 images and test set contains 19,852 images. Note that the object translation is defined relative to the root joint (wrist) of the hand and not the camera optic centre.

The following annotations are provided in the train split:

  • Object Pose (translation relative to hand wrist joint)

  • Object name

  • Object corner locations in the image

  • MANO hand pose parameters

  • MANO hand shape parameters

  • Hand 3D joint locations

  • Hand 2D joint locations in the image

  • Hand-object segmentation map

The following information is provided for test split:

  • MANO hand pose parameters

  • MANO hand shape parameters

  • Hand 3D joint locations

  • Hand 2D joint locations in the image


Rules of Participation

  • The participants are not allowed to use the original HO-3D train/test split as the test split for this challenge overlaps with the train split of original HO-3D. The train/test split for this challenge has been carefully chosen to detect such violations and the violators will be immediately disqualified.

  • Use of other labeled datasets (either real or synthetic) is not allowed.

  • Use of rendered images using the provided hand-object poses is allowed.

  • Use of external unlabelled data is allowed (self-supervised and unsupervised methods).


Evaluation

The accuracy of the methods will be evaluated based on the standard metric, Mean Symmetry-aware Surface Distance (MSSD) [5], which also considers the symmetricity of objects. Due to severe occlusion of the object by the hand, distinctive features on the object may not be visible leading to ambiguous poses. The MSSD metric is defined as,

where SM is a set of global symmetry transformations, VM is a set of mesh vertices of object model M, \hat{P} is the ground-truth pose and P is the estimated pose. The global angle of symmetry for each of the 10 objects is given in the table below.

Submission Format

Estimated object rotation and translation relative to hand root joint should be dumped in a json file. Please refer to challenge_submit.py script in https://github.com/shreyashampali/HANDS2022_Obj_Pose for the submission format. The json files should be compressed into to .zip file before submission.


General comments

  • Ordering of the joints: Please refer to 'skeleton.txt' in dataset folder for ordering of the joints.

  • The images in this dataset are cropped images of the original HO-3D dataset. The object translation needs to be estimated relative to the root joint of the hand

  • Coordinate system: All annotations assume opencv coordinate system i.e., positive x-axis to the right, positive y-axis downwards and positive z-axis into the scene.


Links

Dataset Download Link: Please fill the form to get the download link

Github Page: https://github.com/shreyashampali/HANDS2022_Obj_Pose (contains visualization, submission, and evaluation scripts)

Codalab Challenge: https://codalab.lisn.upsaclay.fr/competitions/6290

References

[1] Mahdi Rad and Vincent Lepetit. “BB8: A Scalable, Accurate, Robust to Partial Occlusion Method for Predicting the 3D Poses of Challenging Objects without Using Depth”, In Proc. IEEE Int'l Conf. on Computer Vision (ICCV), 2017

[2] Wadim Kehl, Fabian Manhardt, Federico Tombari, Slobodan Ilic, Nassir Navab. “ SSD-6D: Making RGB-Based 3D Detection and 6D Pose Estimation Great Again”, ICCV 2017

[3] Yufei Ye, Abhinav Gupta, Shubham Tulsiani. “What's in your hands?3D Reconstruction of Generic Objects in Hands”, CVPR 2022

[4] Yu Xiang, Tanner Schmidt, Venkatraman Narayanan, and Dieter Fox. “PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes”. Science, 2018.

[5] Tomas Hodan, Martin Sundermeyer, Bertram Drost, Yann Labbe, Eric Brachmann, Frank Michel, Carsten Rother, and Jiri Matas. “BOP Challenge 2020 on 6D Object Localization”. In Computer Vision - ECCV 2020 Workshops - Glasgow, UK, 2020