Task 2 - Consistent Motion Reconstruction
Dataset Overview
Figure 1: ARCTIC is a dataset of hands dexterously manipulating articulated objects. The dataset contains videos from both eight 3rd-person allocentric views (a) and one 1st-person egocentric view (b), together with accurate ground-truth 3D hand and object meshes, captured with a high-quality motion capture system. ARCTIC goes beyond existing datasets to enable the study of dexterous bimanual manipulation of articulated objects (c) and provides detailed contact information between the hands and objects during manipulation (d-e).
More details can be found on our website: https://arctic.is.tue.mpg.de/
5-Minute Talk
Task description
Our ARCTIC challenge focuses on the task of consistent motion reconstruction, introduced in our paper. Given a monocular RGB video, the goal of the task is to reconstruct surfaces of two MANO hands and the articulated object at every frame. This task focuses on the consistency of hand-object contact in the reconstructed hand and object surfaces. In this challenge, we use the official splits of the ARCTIC dataset consisting training, validation and test sets (totalled 2.1M images). This challenge will follow the experiment protocol defined in our paper.
Here we briefly summarize the protocol:
Participants will train on the training set (not the validation set).
The validation set will be used for local evaluation.
The test set groundtruth is hidden and participants can submit their predictions to our evaluation server for test set evaluation (details coming soon).
Participants will use the following data from ARCTIC in this challenge:
MANO hand pose 3D annotation (training only)
Articulated object poses 3D annotation (training only)
Pre-defined bounding boxes around the object for network inputs
Camera extrinsics (not intrinsics in allocentric setting)
Sub-Tasks: Since ARCTIC contains 8x 3rd-person views and 1x egocentric view, we hosts two sub-tasks for this challenge: allocentric task and egocentric task. In the former task, participants should use the 3rd-person images for training and evaluation. For the latter task, during training, participants can use all images from the training set (including 3rd-person views). However, during evaluation, only the egocentric view images are used. See Evaluation Protocol in our paper. Since the two sub-tasks have the same formulation in terms of input (RGB image) and outputs (hand and object parameters), participating both sub-tasks is simply a matter of changing the training and validation sets. Therefore, we encourage the participants to submit to both sub-tasks. If one is only interested in one sub-task, she can simply not submit results on the other sub-task.
For a fair comparison, please note the following rules:
Only ARCTIC training set is allowed to be used as 3D supervision for training, that means:
The ARCTIC validation set cannot be used to train the model.
Annotations from external datasets cannot be used for training.
The model cannot assume groundtruth camera intrinsics for allocentric setting.
Participants may be requested to submit their code for checking potential rule violation.
The code should be reproducible for the organizers.
The reproduced results from the code should not deviate from the reported results.
If requested for inspection, participants may not be considered if organizers cannot reproduce the reported results.
Participants violating any rules may not be considered in the challenge. Feel free to contact the organizer (`zicong.fan@inf.ethz.ch`) for clarification.
Getting started:
Ensure that you are registered to the challenge via our Google Form.
Download data and setting up code for training model using our code.
The repository includes code for downloading data, training models, visualizing predictions, and evaluating results.
This also includes dataset documentation.
Instructions for online evaluation will come soon. Meanwhile, you are free to use the validation set for offline evaluation and get started.
If you encountered any technical problems, feel free to open an issue in our repo.
Evaluation:
We will use the Contact Deviation (CDev) metric defined in our paper for evaluation.
We will use the MANO and pre-defined articulated object topologies for evaluation, that means:
For submission convenience, if one directly regresses meshes instead of MANO parameters, she needs to fit MANO and object models to the regressed meshes. The fitted parameters will then be submitted to the test set evaluation server.
Participants are free to decimate the hand and object meshes if needed, however, the test server will use the original meshes for evaluation.
Submission Instruction: See here.
Updated deadline for ARCTIC submission: Sept 23, 23:59 AoE
Clarification:
The given code uses groundtruth instrinsics for the egocentric setting but not the allocentric (see here). To be compatible to our baselines, you may assume instrinsic in egocentric setting.
In the test set, we do not assume groundtruth wrist depth during evaluation.
Acknowledgement
Constructing ARCTIC was a huge undertaking. The authors deeply thank: Tsvetelina Alexiadis (TA) for trial coordination; Markus Höschle (MH), Senya Polikovsky, Matvey Safroshkin, Tobias Bauch (TB) for the capture setup; MH, TA and Galina Henz for data capture; Nima Ghorbani for MoSh++; Priyanka Patel for alignment; Leyre Sánchez Vinuela, Andres Camilo Mendoza Patino, Mustafa Alperen Ekinci for data cleaning; TB for Vicon support; MH and Jakob Reinhardt for object scanning; Taylor McConnell for Vicon support, and data cleaning coordination; Benjamin Pellkofer for IT/web support; Neelay Shah for evaluation server. We also thank Adrian Spurr and Xu Chen for insightful discussion. OT and DT were partially supported by the German Federal Ministry of Education and Research (BMBF): Tübingen AI Center, FKZ: 01IS18039B". DT’s work was partially performed at the MPI-IS.
Our visualization benefits hugely from AITViewer.