Task 1 - Egocentric 3D Hand Pose Estimation
Overview
AssemblyHands is a large-scale benchmark dataset with accurate 3D hand pose annotations to facilitate the study of egocentric activities with challenging hand-object interactions. The dataset includes synchronized egocentric and exocentric images sampled from the recent Assembly101 dataset, in which participants assemble and disassemble take-apart toys. The official website is https://assemblyhands.github.io/.
Figure 1: Row 1-4: exocentric images from static cameras,
Row 5: egocentric images from a VR headset
Instructions
Using AssemblyHands, this challenge focuses on egocentric 3D hand pose estimation from a single-view image. The training, validation, and testing sets contain 383,684, 32,179, and 62,043 images, respectively. We provide the following annotations:
3D hand keypoint coordinates (21 keypoints for each hand)
Hand bounding boxes
Camera intrinsic & extrinsic matrix for 4 egocentric cameras attached to the headset
During test time, we will provide these annotations except for the 3D keypoints. Since the hand bboxes are given, it is not necessary to run hand detection on test data. We expect participants to run your pose estimator on single-view egocentric images.
For a fair comparison, please note the following rules:
DO NOT use any other data from Assembly101 that are not part of the training set for this challenge
But, it is permitted to use Assembly101's auxiliary data that correspond to the training set, such as
exocentric videos (RGB)
context cues like action labels and object category information
It is permitted to use other sources of publicly available datasets
It is permitted to use hand mesh models (e.g., MANO)
DO NOT use the validation set for training or fine-tuning
It is permitted to use multi-view egocentric images as input. Note that it is possible the hand is sometimes visible only from a single camera.
We ask participants to provide which additional cues are used, and then decide on valid submissions that meet the above criteria.
If you consider using other information for training, please feel free to contact us to see if it is feasible for this challenge.
Data download: The download of egocentric images and keypoint annotation, and the examples of a data loader and visualization are implemented in the toolkit: https://github.com/facebookresearch/assemblyhands-toolkit.
Evaluation: We use a standard evaluation metric: MPJPE in a wrist-relative space.
Submission Format (updated on Aug 25): We will evaluate two-hand predictions in world coordinates. Please follow this submission instructions: https://github.com/ut-vision/HANDS2023-AssemblyHands. You can find test images and metadata, and an example of a submission file. Please compress your prediction json into a zip file and upload it to the submission server: https://codalab.lisn.upsaclay.fr/competitions/15149. Our baseline result is listed as the name "NieLin_UTokyo".
Acknowledgment
We thank Dr. Linlin Yang, Prof. Angela Yao (NUS), Dr. Kun He (Meta), and Prof. Yoichi Sato (UTokyo) for helpful discussions on the design of the challenge. This dataset is based on the internship work at Meta Reality Labs, and thanks go to Dr. Fadime Sener, Dr. Tomas Hodan, Dr. Luan Tran, and Dr. Cem Keskin (Meta). We also thank Mr. Nie Lin (UTokyo) for the baseline construction.