Instance Detection Track

The “Continual instance-level object detection” track features a stream of 5 experiences. This track features a fully supervised (annotations are given for the training set) learning scheme.

To obtain the stream of training experiences, the ID of the main object is used to obtain a Class-Incremental scenario. Only the main object will be annotated in each image. No task labels or other additional signals are provided at test time.

Getting started

The devkit repository, which also contains the most technical information and guides, is shared among all tracks and can be found here.

The dataset can be downloaded by filling out this form. A download link will be immediately sent after filling out the form, so make sure you provide an active email address.

The dataset is the same for both classification and detection tracks. For this track, the original, uncropped, images will be used. The devkit will expose incremental experiences under the form of PyTorch datasets returning (image, targets) pairs that follow the format used in the TorchVision Object Detection Finetuning Tutorial: We warmly recommend having a look at it!

Make sure to check the list of available resources on the main challenge page here.

Rules

The benchmark creation procedure can't be changed.
The size of each model obtained after each training iteration (including the final model) is limited to 70M parameters (defined as the number of parameters, not its size in MBytes).
The total number of classes is known from the beginning and can be taken into consideration to pre-allocate the classification head, parts of the model, etcetera (see the template in the devkit).
Model initialization can be done by randomly initializing weights or by pretraining using the ImageNet-1K (ImageNet 2012), COCO2017, or LVIS datasets. Apart from the pre-trained weights, the solution must not access/use data from the pretraining dataset.
Solutions can exploit a replay buffer containing data coming from up to 5000 training samples (image+annotations). The buffer must be initially empty and can be populated using data from the current experience only. This means that the replay buffer cannot be populated beforehand, nor it can be filled with data from future experiences. The only samples from past experiences must be the ones chosen before terminating the training on that experience.
Test-time training or tuning is not allowed. The solution must be able to predict the output for a training instance immediately after the training phase.
Exploiting any correlation that may exist between test instances is not allowed. The solution must be able to return a prediction without accessing other test data.
The solution must NOT use information regarding the category of instances, even if they are included in the training and test annotation files.
The solution must NOT use the information regarding the video ID at test time. The video ID can be used at training time (for instance, to generate a validation set based on the video ID).
The maximum allowed execution time for the whole solution (training+test) is 24 hours. This time will be measured on the reference server described on the challenge page.
In addition, all other general rules found on the main challenge page apply!

Evaluation

The CodaLab portal for submitting solutions for the Instance Detection track can be found: here.

A solution consists of a zip file containing 5 files. Each file must contain the predictions on the full test set obtained after completing the corresponding training experience. The devkit already contains a plugin that will take care of storing such output. The resulting files should be zipped without including intermediate directories inside the archive (and without changing the name of files).

The CodaLab portal reports how many submissions you already uploaded as well as the total and daily limits for submissions. The default behavior of CodaLab is to ignore failed submissions (invalid format, zip structure, etcetera) so that they do not count towards the submissions limit.

The final score is the Average mAP: for each pass on the test step, the mAP is calculated. Then, these 5 results are averaged to get the final result. The detection metrics are computed using the provided EgoObjects API, which is based on the lvis package.

Google Sites

Report abuse