Challenge
Challenge Overview
Continual learning involves the challenging problem of training a model on a non-stationary stream of experiences. Currently, benchmarks for continual learning often use a very specific type of stream, in which each experience is only seen once and with no overlap between the experiences. In this case, the experiences are often referred to as 'tasks'. Although these benchmarks have proven useful for academic purposes, they do not reflect the arbitrary non-stationarity that can be observed in the real world. For example, these benchmarks do not contain any repetition.
The challenge DevKit can be accessed on Github: https://github.com/ContinualAI/clvision-challenge-2023
Challenge Goals
In this challenge, the goal is to design efficient strategies for a class of continual learning problems we refer to as Class-incremental with Repetition (CIR). CIR encompasses a variety of streams with two key characteristics: (i) previously observed classes can re-appear in a new experience with arbitrary repetition patterns, and (ii) not all classes have to appear in every experience. Since many existing strategies were developed for continual learning problems without repetition, it is unclear how they would perform and compare in CIR streams. To explore the significance of repetition and its relevance for developing novel strategies, we provide a set of CIR benchmarks created by a stream generator that is controlled by three parameters with clear interpretation. Participants are asked to develop strategies that, after the model has finished training on the entire stream, achieve high average accuracy on a test set that contains an equal number of unseen examples of all classes in the stream.
CIR Stream Generator
Given a static dataset with multiple classes, the generator used for this challenge creates random streams via four interpretable control parameters:
Stream Length (N): number of experiences in the stream.
Experience Size (S): number of patterns in each experience. By default, experience size is equally divided between present classes in each experience.
First Occurrence Distribution (Pf): a discrete probability distribution over the experiences in the stream that determines how the first occurrences of dataset classes can happen throughout the stream.
Repetition Probability (Pr): per-class repetition probabilities that control how likely it is for each class to re-appear after its first occurrence in the stream. In the simplest form, it is a list of probability values, one for each class.
The "first occurrence" control parameter Pf determines the timing of when dataset classes appear for the first time in the stream. For instance, in one stream, all classes may appear for the first time in the beginning, and in another stream, new classes may appear randomly throughout the stream with equal probability. It is important to investigate how changing Pf may affect the models' learning and thus design strategies that are more robust to such changes in the stream. Below, you can see different examples of generated streams by changing Pf and fixing Pr={0.2, 0.2,...,0.2} and N=50 and S=2000.
Example 1
Most of the classes are observed in the first 5 experiences.
Pf:
Type=Geometric
p=0.6, 0 ≤ i ≤ 49
Example 2
Most of the classes are observed before experience 20.
Pf:
Type=Geometric
p=0.3, 0 ≤ i ≤ 49
Example 3
Novel classes can appear throughout the stream.
Pf:
Type=Geometric
p=0.01, 0 ≤ i ≤ 49
Challenge Streams
Number of experiences: 50
Number of samples in each experience: max 2000
* Samples are equally divided between present classes in each experience.
The challenge stream configurations can be found here:
https://github.com/ContinualAI/clvision-challenge-2023/tree/main/scenario_configs
Evaluation and Common Rules
Participants are challenged to develop new strategies using the DevKit provided, with the goal of achieving optimal test accuracy for the CIR streams with a fixed test set.
The challenge will be articulated in two different phases:
The pre-selection phase: participants will be asked to run experiments on their machines. The Codalab platform will be used to gather the model outputs for the test set (which is released without ground truth annotations) and to compute the submission score;
Final evaluation: the top five strategies with the highest average test accuracy will be evaluated on novel CIR streams that are similar to the ones provided in the DevKit but with small variations in stream generation parameters. These variations are intended to test the robustness of the strategies submitted. The top strategy will be announced as the winner.
The top five strategies might be asked to submit a report and prepare a (short) presentation to be given during the workshop. Report papers may optionally be asked from teams that have submitted interesting solutions, even among non-winning ones.
The DevKit is based on Avalanche. The devkit is available at the Challenge Repo. Changing the data loading process and competition-related and the base model modules is not permitted.
Using multiple accounts on CodaLab to increase the number of submissions is prohibited.
The organizers reserve the absolute right to disqualify entries that are incomplete or illegible, late entries, or entries that violate the rules.
Restrictions
Submission: participants must submit only a single strategy, and the submitted predictions for all challenge streams must be from the same strategy.
Strategy Design: within each experience, users have full access to the data of that experience. In the default settings of the DevKit, the model goes through each experience for 20 epochs. The participants are free to tweak and tailor the epoch iterations and dataset loading, for example, one may iterate for more epochs in the initial experiences and less in the final ones depending on a particular criterion.
Model Architecture: all participants must use the (Slim-)ResNet-18 provided in the DevKit as the base architecture for their models. However, they are allowed to add additional modules, e.g. gating modules, as long as they do not exceed the maximum GPU memory and RAM usage allowed for the competition
Replay Buffer: Replay buffers may not be used to store dataset samples. However, buffers may be used to store any form of data representation, such as the model's internal representations.
Regardless of buffer type, buffer size (i.e., the total number of stored exemplars) should not exceed 200.
Hardware Limitations
Number of GPUs: Participants are allowed to use one GPU for training only.
Hardware usage (controlled by the DevKit after each experience in the stream):
Max GPU Memory Usage: 4000 MB
Max Training Time: 500 Min
* These restrictions are set based on a training session conducted on Google Colab for a strategy that combines EWC and LWF.
Tentative schedule
20th March 2023: Beginning of the competition
The challenge stream config files are released and the competition starts. The challenge portal starts accepting submissions!20th May 2023: End of the pre-selection phase
The submissions portal will stop accepting submissions. The five highest-ranking participants will be asked to send their solutions and reports to the challenge organizers.June 2023: Workshop day
Winners will present their solutions!
Challenge Portal
To participate in the challenge, use the link below:
THE PRE-SELECTION PHASE OF THE CHALLENGE HAS NOW FINISHED.