Challenge

Challenge Overview

It is fair to assume that data is not cheap to acquire, store and label in real-world machine learning applications. Therefore, it is crucial to develop strategies that are flexible enough to learn from streams of experiences, without forgetting what has been learned previously. Additionally, contextual unlabelled data can also be exploited to integrate additional information into the model.

This challenge aims to explore techniques that combine these two fundamental aspects of data efficiency: continual learning and unlabelled data usage.

Strategies can be submitted at the CodaLab competition: https://codalab.lisn.upsaclay.fr/competitions/17780

The challenge DevKit can be accessed on GitHub: https://github.com/ContinualAI/clvision-challenge-2024

The pre-selection phase will run until May 13th 2024.

A prize of 1,000 dollars is sponsored by Apple for the top-ranking participants.


IMPORTANT UPDATE ❗ -- The competition has been extended 1 week, until 13th May!

Due to discrepancies in the data configuration files provided in the repository, the pickle configurations have been updated to reflect the scenarios depicted on the challenge website. We kindly ask you to pull the latest commit with the configurations, retrain your models and submit your best strategies. The Codalab leaderboard has been reset, and all participants have regained the original 50 attempts.

We have updated the scoring function to compensate for some submission mismatches and make sure it prioritizes the final accuracy, now it is handled within the submission evaluation. All results have been re-evaluated, but that has no effect on the total number of submissions.

We apologize for any inconvenience this may have caused.


Challenge Goals

The goal of this challenge is to tackle the Class-Incremental with Repetition (CIR) problem by exploiting unlabelled data.

CIR encompasses a variety of scenarios with two key characteristics:


In this competition, each scenario is divided into 50 experiences. In each experience, we have a training session with access to a set of labelled and unlabelled samples. During this time we can train, update or adapt our model using the labelled data, while supported with the unlabelled data. However, once the training session is over, both labelled and unlabelled data becomes unavailable. Depending on the scenario, at future experiences it will be possible to have access to samples from seen, unseen or distractor classes. Distractor classes represent elements in the stream that can be sampled, but are not required to be learned or classified, and therefore never appear in the labelled stream.

Based on these last three restrictions, three scenarios are proposed to test the robustness of the strategies developed by the participants. Each of the scenarios presents a Labelled Data Stream (LS) and an Unlabelled Data Stream (US). Depending on the scenario, the US can contain samples belonging to:

Participants are asked to develop strategies that, after the model has finished training on the entire stream of experiences, achieve high average accuracy on an evaluation test set which contains a balanced number of unseen samples from all classes in the LS. The proposed strategy will learn a different model for each scenario, but has to apply the same algorithm to all three scenarios.

Metrics

We evaluate all submissions with the classic final accuracy on an evaluation test set. The final accuracy metric measures the accuracy of predictions on a test set containing novel instances from all previously seen classes. The test set contains a balanced representation of new instances from all classes seen as labelled during the trained sequence. No images from the distractor classes are included. After training on each scenario, the competitors have to provide a class prediction for each of the images in the test set.

The leaderboard will be ranked by the average final accuracy among the three scenarios. However, we also will provide another tie-break metric within the leaderboard: convergence rate of accuracy over experiences. After each experience, a prediction on the test set is provided and the corresponding accuracy is stored. This accuracy is calculated only with regard to test samples that belong to the classes seen so far. The formula for the accuracy and convergence rate are:

where the weights wj  give more importance to variance within the later experiences.

For more information on the format of the prediction submission, check the details here: https://github.com/ContinualAI/clvision-challenge-2024.

Scenarios

This challenge consists of three scenarios based on an [ImageNet]-like computer vision dataset with a fixed number of classes. Each scenario consists of 50 experiences with a 500 labelled images and 1,000 unlabelled images. These images constitute the above-mentioned Labelled Data Stream (LS) and Unlabelled Data Stream (US) at each experience.

Samples are equally balanced among present classes in each experience. More details on the scenario distributions at the bottom of this page.

Evaluation and Common Rules

Participants are challenged to develop new strategies using the provided DevKit. The challenge will be articulated in two different phases:

The top strategies might be asked to submit a report and prepare a (short) presentation to be given during the workshop. Report papers may optionally be asked from teams that have submitted interesting solutions, even among non-winning ones.

The DevKit is based on Avalanche. Changing the data loading process and competition-related modules is not permitted.

Participants are allowed to work in teams, but only one member can submit predictions to the CodaLab system. Each team is allowed 3 submissions per day, with a limitation of 50 total submissions throughout the competition. Using multiple accounts on CodaLab to increase the number of submissions is prohibited.

The organizers reserve the absolute right to disqualify entries that are incomplete or illegible, late entries, or entries that violate the rules.

Strategy Restrictions

Submission: for each submission, the predictions for the three scenarios must come from the same strategy. The strategy must be able to solve the three settings without having a scenario-ID, since it will have to work fine on the novel scenarios in the final phase. In general, this can be seen as the strategy being able to solve the more complex scenario 3, while still being able to solve the simpler experience sequences from scenarios 1 and 2. No data from external sources can be used.

Strategy Design: within each experience, users have full access to the data of that experience. No data from other experiences can be accessed. The default number of epochs or training regime for each experience can be modified. The participants are free to adapt and tailor the epoch iterations and dataset loading. As an example, one may iterate for more epochs in the initial experiences and less in the final ones depending on a particular criterion.

Model Architecture: all participants must use the ResNet-18 provided in the DevKit as the base architecture for their models. However, they are allowed to add additional modules, e.g. gating modules, as long as they do not exceed the maximum GPU memory allowed for the competition. The model cannot be initialized using pretrained weights.

Replay Buffer: Replay buffers may not be used to store dataset images. However, buffers may be used to store any form of data representation or statistics, such as the model's internal representations. Regardless of buffer type, the buffer size (i.e., the total number of stored exemplars) should not exceed 200.

Hardware Limitations

Number of GPUs: Participants are allowed to use 1 GPU for training only.

Max GPU Memory Usage: 8000 MB

Max Training Time: 600 Min

Hardware usage is monitored by the DevKit after each experience. These restrictions are set based on training sessions conducted locally for baseline strategies.

As a reference, evaluation of the submitted strategies during the final phase will be done in a machine with an NVIDIA TITAN RTX (24Gb GPU) with 64Gb of RAM and 12 CPU cores.

Schedule (all times are AoE timezone)

20th February 2024: Beginning of the competition, start of the pre-selection phase.

The challenge scenario config files are released together with the DevKit. The CodaLab leaderboard starts accepting submissions!

13th May 2024: End of the pre-selection phase, start of final evaluation phase.

The submissions portal will stop accepting submissions. The highest-ranking participants will be asked to send their solutions and reports to the challenge organizers for the final evaluation.

18th May 2024: End of final evaluation phase.

The organizers will evaluate the strategies and reports from the highest-ranking participants on the novel scenarios and prepare a final ranking to be revealed on the workshop day. Participants with valid strategies will be asked to present them during the workshop.

18th June 2024: Workshop day.

Winners will present their solutions!

Challenge Portal

To participate in the challenge, use the link: https://codalab.lisn.upsaclay.fr/competitions/17780

FAQ


Scenario 1: LS and US contain the same classes in each experience.

Scenario 2: US contains the same classes as LS, as well as past or future classes from the whole LS.

Scenario 3: US contains the same classes as LS, as well as past or future classes from the whole LS, and distractor classes not present in LS.