Challenge

Challenge Overview

The popularity of the continual learning (CL) field has rapidly increased in recent years. The ability to adapt to new environments and learn new skills will be an essential building block towards the creation of autonomous agents.

Continual Learning (CL) as a field focuses on developing algorithms with the ability to accumulate knowledge and skills by interacting with non-stationary environments and data distributions. In typical continual learning scenarios, models are trained in training episodes (experiences) and they only have access to a part of the training dataset in each experience. This very challenging problem has generated a lot of interest in recent years, with new solutions appearing rapidly. This workshop challenge introduces a more realistic benchmark with scenes featuring everyday objects and environments through a new dataset: EgoObjects.

Continual learning and computer vision areas form a long-standing research duo. To date, most of the research efforts have been directed towards tackling the object classification problem. The importance of this problem cannot be argued, as it serves as the initial step towards building continuously learning systems for vision applications. However, this doesn’t hold true for the object detection problem.

Another direction of interest is the instance-level object recognition, in which the goal is to predict which specific object is depicted. This is in contrast with category-level recognition, where the goal is to predict the general category objects.

The goal of the challenge

We believe that the goal of a challenge is to stimulate the research community to produce new and more effective solutions in promising research directions. Challenges hosted at previous CLVision workshops featured novel elements of complexity that the participants had to overcome. The challenge hosted in the 1st CLVision workshop proposed a difficult benchmark made of nearly 400 incremental experiences. The challenge hosted in the 2nd CLVision workshop proposed a complex continual reinforcement learning benchmark. For this workshop, we propose a continual learning challenge with three different tracks: a moreclassic” classification track, and two novel detection tracks.

The EgoObjects Dataset

The challenge is supported by the EgoObjects dataset provided by Meta, a massive-scale egocentric dataset for objects. EgoObjects is a video dataset created to push the frontier of open-world object understanding from a Continual Learning perspective.

EgoObjects dataset logo
EgoObjects dataset videos preview

Main features:

  • Videos have been taken with a wide range of egocentric recording devices (Rayban Stories, Snap Spectacles, and Mobile) in realistic household/office environments from 25+ countries and regions.

  • Videos feature a great variety of lighting conditions, scale, camera motion, and background complexity.

  • Video frames have been sampled and annotated with rich ground truth, including object category, object instance ID, 2D bounding box.

  • Each video depicts one main object. Moreover, in the same video, the surrounding objects are also annotated.

  • The annotation format is similar to the one used by the LVIS dataset. This format, in turn, is very similar to the format of COCO.

  • The main object is used in the object classification track (by cropping the relevant portion of the image) and class labels are taken at the instance level.

  • The scene complexity (amount of objects, occlusions, etc) is less than the one found in COCO, but the image quality is more varied!


The challenge test set will NOT contain annotations.


Prizes

Total challenge prizes: $10.000!

  • Instance Classification track

    • 1st place: 1500 USD

    • 2nd place: 900 USD

    • 3rd place: 600 USD

  • Category Detection track

    • 1st place: 1700 USD

    • 2nd place: 1100 USD

    • 3rd place: 700 USD

  • Instance Detection track

    • 1st place: 1700 USD

    • 2nd place: 1100 USD

    • 3rd place: 700 USD


Challenge tracks

For all these tracks, a dataset for continual object detection and classification will be used. The dataset features short video sessions taken from an egocentric point of view. As a reference, similar datasets are CORe50 and OpenLORIS.

Mainstream detection datasets like VOC, COCO, or LVIS could be used, but the complexity of those datasets would be ill-matched with the current state of the research for continual detection tasks, with obvious discouraging effects. The goal of the detection tracks is to encourage researchers working in the CLVision field to move the first step into the continual object detection task.

The challenge is organized in the following 3 tracks:

  1. Continual instance-level object classification track
    This is the most classic track. The submitted solution should be able to handle a stream of training experiences containing images of common household/workplace objects. The solution will be able to access the ground-truth label of each training image in order to incrementally train its internal knowledge model (fully supervised). Images will depict a single object and the expected prediction is a classification label. The solution must return predictions at the instance level. That is, solutions will need to disentangle between objects belonging to common categories.

  2. Continual category-level object detection track
    In this track, incremental experiences will carry short videos of common household/workplace objects. Objects will be depicted in common household and workplace environments, with each image depicting more than one object. The goal is to predict the bounding box and label of the depicted objects. This can be a very good starting point when first approaching continual detection tasks. Images are far less clustered and complex than the ones in mainstream detection datasets. Popular continual learning approaches (replay, regularization, …) can really make the difference.

  3. Continual instance-level object detection track
    In this track, incremental experiences will carry short videos of common household/workplace objects. Differently from its category-level counterpart, the goal for this track is to predict the object labels at the instance level. Each video will feature a single “reference” object (possibly surrounded by other unrelated objects). The goal is to predict the position and instance label of that reference object. This may be the hardest track, but solutions for the category detection track can be easily adapted to cover this scenario.

Evaluation and common rules

Each track has some specific set of rules but the following are common to all tracks:


  • The challenge will be articulated in two different phases:

    1. the pre-selection phase: participants will be asked to run experiments on their machines. The Codalab platform will be used to gather the model outputs for the test set (which is released without ground truth annotations) and to compute the submission score;

    2. the final evaluation phase: the top 5 teams will be asked to send a working copy of their solution (code and related resources) to the challenge organizers. Organizers will check that the submission follows the rules and that results can be reproduced.

  • Winners will be asked to submit a report and prepare a (short) presentation to be given during the workshop. Report papers may optionally be asked to teams that have submitted interesting solutions even among non-winning ones.

  • The devkit for all three tracks is based on Avalanche. The devkit is available at the Challenge Repo. Changing the data loading process, including the order in which training and test data are encountered, is not permitted. Other parts of the training loop can be customized including the batch size, shuffling order, etcetera.

  • Using multiple accounts on CodaLab to increase the number of submissions is prohibited.

  • The organizers reserve the absolute right to disqualify entries that are incomplete or illegible, late entries, or entries that violate the rules.


For track-specific rules, metrics, and resources, please refer to the appropriate page.


Reference evaluation server

  • AMD EPYC 7282, 128 GB RAM @ 2666 MHz

  • Nvidia Quadro RTX 5000

  • SSD WD SA210SFF


Resources

Legacy resources:

Note: CodaLab homepage reports that there will be a power outage on April 5, 2022, 7 am - 2 pm CET.


Tentative schedule

  • 7th March 2022 (previously 4th March 2022): Beginning of the demo track (demo dataset)
    The demo dataset will be a smaller version of the challenge dataset. Use it to start working on your solution in advance ;)

  • 30th March 2022 (previously 27th March 2022): Beginning of the competition
    The challenge dataset is released and the competition tracks are started. The challenge portal will start accepting submissions!

  • 29th May 2022: End of the pre-selection phase
    The submissions portal will stop accepting submissions. The 5 highest-ranking participants of each track will be asked to send their solutions and reports to the challenge organizers.

  • June 2022: Workshop day
    Winners will present their solutions!

FAQ

  • Can I use a pretraining dataset that is not listed on the track web page?
    No, pretraining is only allowed using the listed datasets.

  • During the training phase, can the overall model size (or the sum of the sizes of all models, if using more than one model) be > 70M parameters?
    Yes, there are no constraints regarding the model sizes used at training time (apart from the physical limits of the evaluation server). However, once the training phase is completed, the solution is only allowed to keep model(s) for an overall <= 70M parameters (+ the replay buffer).

  • When using a replay buffer, can I store all annotations for a given image?
    Yes, you are allowed to store all annotations for those training images you chose to keep. For the classification track, this means you can keep the true label of that image. For detection tracks, you can keep the bounding boxes and their associated labels. If you remove an image from the replay buffer, you will have to remove annotations coming from that image, too.

  • Is the use of Avalanche mandatory?
    Avalanche is not mandatory, but you are required to use the given benchmark generation procedure. The dataset field in each experience is an object that returns a three-elements tuple:
    (PIL Image, classification label or detection annotations, task label). The task label is always 0 for this challenge. It should be easy to use those datasets outside Avalanche (or even PyTorch).

  • I got errors regarding images named "<image_id>.jpg" that cannot be found.
    In late April, a different version of the image .zip and related .json files have been uploaded at the same download links you got via email. The zip features a smaller size (28 GB instead of 43GB) as unused images (images not listed in the jsons) have been removed. In addition, the file names of images have been standardized to "<image_id>.jpg". No other data has been changed and this will not affect your algorithm in any way (the data is the very same). The error is possibly linked to the fact you are using the new json files with images from the old zip (named with the older scheme) or vice versa (new image names, old json files).

MORE INFO TO COME DURING THE NEXT WEEKS. BOOKMARK THIS PAGE AND STAY TUNED!