Task 2: Semi/Self-supervised Two-hands 3D Pose Estimation during Hand-object and Hand-hand interactions
Task 2: Semi/Self-supervised Two-hands 3D Pose Estimation during Hand-object and Hand-hand interactions
Abstract
Assembly101 is a new procedural activity dataset featuring 4321 videos of people assembling and disassembling 101 "take-apart" toy vehicles. Participants work without fixed instructions, and the sequences feature rich and natural variations in action ordering, mistakes, and corrections. Assembly101 is the first multi-view action dataset, with simultaneous static (8) and egocentric (4) recordings. The official website is https://assembly-101.github.io.
Instruction
Based on Assembly101, this challenge will emphasize reduced ground truth labels and focus on topics such as semi-supervised or self-supervised learning for training hand pose estimation systems. We target two-hand 3D pose estimation. For evaluation, we will use end point error and PCK curve taking annotation confidence into account.
Specifically, we will provide
Multi-view hand object interaction videos without 3D hand pose annotation
Camera intrinsic & extrinsic matrix for simultaneous static (8) and egocentric (4) recordings
A validation multiview video with human annotated 3D & 2D labels (can not be used for training)
An evaluation multiview video without labels for testing
For a fair comparison, we only allow to use below information
Arbitrary method to get hand bounding box or hand segmentation (we provided one in the dataset)
OpenPose to provide predicted 2D poses
You can choose either static RGB videos or egocentric videos for semi-/self-supervised training
Arbitrary synthetic data (e.g., synthetic dataset like RHD or self synthetic data like ObMan)
HO-3D dataset in our HANDS22 Challenge (other real-world datasets are not allowed)
Hand models (e.g., MANO)
The validation/evaluation sets can not be used for training or fine-tuning
If you would like to use other information for training, please feel free to contact us to check if they are feasible for this challenge.
Submission Format
The results are evaluated using the CodaLab server: https://codalab.lisn.upsaclay.fr/competitions/6979.
Estimated hand poses of each video should be dumped in a json file. Please refer to challenge_submit.py script in https://github.com/bestonebyone/HANDS2022_Assembly101 for the submission format. The json files should be compressed into to .zip file before submission.
Visualisation
Please refer to validation_vis.py script in https://github.com/bestonebyone/HANDS2022_Assembly101 for the validation visualisation.
Ordering of the joints
For each frame, the prediction (42x3) should follow 0-20 for right hand and 21-41 for left hand. 0-3: right thumb [tip to mcp], 4-7: right index, 8-11 right middle finger, 12-15 right ring finger, 16-19 right pinky finger, 20: right wrist, 21-24: left thumb, 25-28: left index, ..., 41: left wrist. Please check the annotation from the validation set for more information.
Acknowledgement
Thanks to the Assembly101 team, for providing Assembly101 for our challenge and special thanks to Dr. Kun He for the annotations of the dataset.