Challenge

Challenge Overview

Continual learning involves the challenging problem of training a model on a non-stationary stream of experiences. Currently, benchmarks for continual learning often use a very specific type of stream, in which each experience is only seen once and with no overlap between the experiences. In this case, the experiences are often referred to as 'tasks'. Although these benchmarks have proven useful for academic purposes, they do not reflect the arbitrary non-stationarity that can be observed in the real world. For example, these benchmarks do not contain any repetition.

The challenge DevKit can be accessed on Github: https://github.com/ContinualAI/clvision-challenge-2023

Challenge Goals

In this challenge, the goal is to design efficient strategies for a class of continual learning problems we refer to as Class-incremental with Repetition (CIR). CIR encompasses a variety of streams with two key characteristics: (i) previously observed classes can re-appear in a new experience with arbitrary repetition patterns, and (ii) not all classes have to appear in every experience. Since many existing strategies were developed for continual learning problems without repetition, it is unclear how they would perform and compare in CIR streams. To explore the significance of repetition and its relevance for developing novel strategies, we provide a set of CIR benchmarks created by a stream generator that is controlled by three parameters with clear interpretation. Participants are asked to develop strategies that, after the model has finished training on the entire stream, achieve high average accuracy on a test set that contains an equal number of unseen examples of all classes in the stream. 

CIR Stream Generator

Given a static dataset with multiple classes, the generator used for this challenge creates random streams via four interpretable control parameters:

 

The "first occurrence" control parameter  Pf  determines the timing of when dataset classes appear for the first time in the stream. For instance, in one stream, all classes may appear for the first time in the beginning, and in another stream, new classes may appear randomly throughout the stream with equal probability. It is important to investigate how changing Pf may affect the models' learning and thus design strategies that are more robust to such changes in the stream. Below, you can see different examples of generated streams by changing Pf and fixing Pr={0.2, 0.2,...,0.2} and N=50 and S=2000.


Example 1

Most of the classes are observed in the first 5 experiences.


Pf:
Type=Geometric 

p=0.6, 0 i ≤ 49


Example 2

Most of the classes are observed before experience 20.


Pf:
Type=Geometric 

p=0.3, 0 ≤ i ≤ 49


Example 3

Novel classes can appear throughout the stream.

Pf:
Type=Geometric 

p=0.01, 0 ≤ i ≤ 49

Challenge Streams

Number of experiences: 50

Number of samples in each experience: max 2000

* Samples are equally divided between present classes in each experience. 


The challenge stream configurations can be found here:
https://github.com/ContinualAI/clvision-challenge-2023/tree/main/scenario_configs


Evaluation and Common Rules

Participants are challenged to develop new strategies using the DevKit provided, with the goal of achieving optimal test accuracy for the CIR streams  with a fixed test set. 

The challenge will be articulated in two different phases:

The pre-selection phase: participants will be asked to run experiments on their machines. The Codalab platform will be used to gather the model outputs for the test set (which is released without ground truth annotations) and to compute the submission score;

Final evaluation: the top five strategies with the highest average test accuracy will be evaluated on novel CIR streams that are similar to the ones provided in the DevKit but with small variations in stream generation parameters. These variations are intended to test the robustness of the strategies submitted. The top strategy will be announced as the winner.

The top five strategies might be asked to submit a report and prepare a (short) presentation to be given during the workshop. Report papers may optionally be asked from teams that have submitted interesting solutions, even among non-winning ones.


Restrictions

Submission: participants must submit only a single strategy, and the submitted predictions for all challenge streams must be from the same strategy.  

Strategy Design: within each experience, users have full access to the data of that experience. In the default settings of the DevKit, the model goes through each experience for 20 epochs. The participants are free to tweak and tailor the epoch iterations and dataset loading, for example, one may iterate for more epochs in the initial experiences and less in the final ones depending on a particular criterion.

Model Architecture: all participants must use the (Slim-)ResNet-18  provided in the DevKit as the base architecture for their models. However, they are allowed to add additional modules, e.g. gating modules, as long as they do not exceed the maximum GPU memory and RAM usage allowed for the competition 

Replay Buffer: Replay buffers may not be used to store dataset samples. However, buffers may be used to store any form of data representation, such as the model's internal representations


Hardware Limitations

Number of GPUs: Participants are allowed to use one GPU for training only. 

Hardware usage (controlled by the DevKit after each experience in the stream):

* These restrictions are set based on a training session conducted on Google Colab for a strategy that combines EWC and LWF.


Tentative schedule



Challenge Portal

To participate in the challenge, use the link below:

https://codalab.lisn.upsaclay.fr/competitions/11559

THE PRE-SELECTION PHASE OF THE CHALLENGE HAS NOW FINISHED.