Challenge

Challenge Description

During the past few years we have witnessed a renewed and growing attention to Continuous Learning (CL) [Parisi, 2019]. The interest in CL is essentially twofold. From the artificial intelligence perspective, CL can be seen as another important step towards the grand goal of creating autonomous agents which can learn continuously and acquire new complex skills and knowledge. From a more practical perspective, CL looks particularly appealing because it enables two important properties: adaptability and scalability. One of the key hallmarks of CL techniques is the ability to update the models by using only recent data (i.e., without accessing old data). This is often the only practical solution when learning on the edge from high-dimensional streaming or ephemeral data, which would be impossible to keep in memory and process from scratch every time a new piece of information becomes available. Unfortunately, when (deep or shallow) neural networks are trained only on newdata, they experience a rapid overriding of their weights with a phenomenon known in literature catastrophic forgetting.

In the CLVision workshop we plan to provide a comprehensive 2-phase challenge track to thoroughly assess novel continual learning solutions in the computer vision context based on 3 different continual learning (CL) protocols. With this challenge we aim to:

Invite the research community to scale up CL approaches to natural images and possibly on video benchmarks.
Invite the community to work on solutions that can generalize over multiple CL protocols and settings (e.g. with or without a “task” supervised signal).
Provide the first opportunity for comprehensive evaluation on a shared hardware platform for a fair comparison.
Provide the first opportunity to show the generalization capabilities (over learning) of the proposed approaches on a hidden continual learning benchmark.

The challenge is held in the Codalab platform.

Dataset & Challenge Tracks

The challenge will be based on the CORe50 dataset and composed of three tracks:

New Instances (NI): In this setting 8 training batches of the same 50 classes are encountered over time. Each training batch is composed of different images collected in different environmental conditions.
Multi-Task New Classes (Multi-Task-NC)*: In this setting the 50 different classes are split into 9 different tasks: 10 classes in the first batch and 5 classes in the other 8. In this case the task label will be provided during training and test.
New Instances and Classes (NIC): this protocol is composed of 391 training batches containing 300 images of a single class. No task label will be provided and each batch may contain images of a class seen before as well as a completely new class.

We do not expect each participant to necessarily submit a solution that is working for all of them. Each participant may decide to run for one track or more, but she will compete automatically in all the 4 separate rankings (ni, multi-task-nc, nic, all of them).

*Multi-Task-NC constitutes a simplified variation of the originally proposed New Classes (NC) protocol (where the task label is not provided during train and test).

Prizes

There will be four monetary prizes:

800$ for the participant with highest average score accross the three tasks (ni, nc, nic)
500$ for the participant with highest score on the ni task
500$ for the participant with highest score on the multi-task-nc task
500$ for the participant with highest score on the nic task

These prizes are kindly sponsored by Intel Labs (China)

Other top participants in the competition may also receive swag sponsored by Element AI, Continual AI, or Intel Labs (China). (TBD)

Rules

The challenge will be articulated in two different phases: i) a pre-selection phase and ii) the final evaluation. Teams passing the pre-selection phase, will be invited to join the workshop and required to submit a short paper (and a poster) describing their approach. Winning teams will be announced at the end of the final evaluation.
The challenge will be based on the Codalab platform. For the pre-selection phase, the team will be asked to run the experiments locally on their machines with the help of a Python repository to easily load the data and generate the submission file (with all the necessary data to execute the submission remotely and verify the adherence to the competition rules if needed). The submission file, once uploaded, will be used to compute the CLScore (see below) which will determine the ranking in the main scoreboard (Please note that images in the test set are not temporally coherent).
It is possible to optimize the data loader, but not to change the data order or the protocol itself. Keep in mind that this may differ from the “native” CORe50 data loader and protocols.
The ranking for each leaderboard will be based on the aggregation metric (CL_score) provided in the following section. Please note that the collection of the metadata to compute the CL_scoree is mandatory and should respect the frequency requested for each metric.
The top 10 teams in the scoreboard at the end of the first phase will be selected for the final evaluation.
The final evaluation consists in a remote evaluation of the final submission for each team. This is to make sure the final ranking is computed in the same computational environment for a fair comparison. In this phase, indeed, experiments will be run remotely for all the teams over a 32 CPU cores, 1 GPU NVIDIA Titan X, 64 GB RAM linux system. The max running time will be capped at 5 hours for each submission (for all the tracks) and the training batch order for each protocol will be randomly shuffled for the final test.
Each team selected for the final evaluation should submit a single dockerized solution which should contain the exact same solution submitted for the last codalab evaluation. The docker image (we will prepare an initial one for you) can be customized at will but should not exceed 5 GB.
A few bonus points will be given to each team if the algorithm can learn well also on a hidden (but similar) continual learning benchmark with a threshold on the CL_score to be later decided by the challenge chairs. This is fundamental to verify whether the proposed CL approach overfits the given dataset.

Metrics

Each solution will be evaluated across a number of metrics:

Final Accuracy on the Test Set*: should be computed only at the end of the training.
Average Accuracy Over Time on the Validation Set*: should be computed at every batch/task.
Total Training/Test time: total running time from start to end of the main function (in Minutes).
RAM Usage: Total memory occupation of the process and its eventual sub-processes. Should be computed at every epoch (in MB).
Disk Usage: Only of additional data produced during training (like replay patterns) and also pre-trained weights. Should be computed at every epoch (in MB).

Pre-selection aggregation metric (Codalab <Ranking>): In the Codalab leaderboard, a global "ranking" is shown, which averages the test accuracy across all the submitted tracks for each participant. This metric is only useful for participants that compete on the the three tasks simultaneously.

Final aggregation metric (CL_score): weighted average of the 1-5 metrics (0.3, 0.1, 0.15, 0.125, 0.125 respectively; metrics weights value for the average may be subject to change based on the challenge board discretion).

N.B: Only test accuracy will be considered in the ranking of the pre-selection phase of the challenge, since it will be run on participants local hardware but it will be taken into account in the final evaluation.

*Accuracy in CORe50 is computed on a fixed test set. Rationale behind this choice is explained in [1].

Submission file

The submission file should be a zip file containing:

Three directories: “ni”, “multi-task-nc” and “nic” or just one of them, depening on your participation to the challenge categories.
Each directory should contain:
- A directory named “code_snapshot” with the code to generate the results.
- test_preds.txt: a list of predicted labels for the test set separated by "\n"
- metadata.txt: containing 6 float separated by "\n" and representing (avg accuracy on the validation set over time, total train time in minutes, avg and max ram usage (in MB), avg and max disk usage (in MB)

An example on how to generate the submission file will be made available in the Python repository of the challenge.

Important dates

Beginning of the pre-selection phase (release of data and baselines): ~~15th Feb 2020~~
Pre-selection phase ends: ~~3rd May 2020~~
Dockerized solution + report submission deadline: 8th May 2020
- Please include your dockerized solution (read carefully the instruction here) and your 4-pages report (using the CVPR template) in a single zip file.The final archive should be uploaded to a file sharing service of your choice and a share link has to be sent to vincenzo.lomonaco@unibo.it via email. The link must allow direct access to the submission archive so that the download can be completed without having to be registered to the chosen file sharing service. Use "CLVision Challenge Submission " followed by your Codalab account username as the subject for your mail. Also, please include the full list of your team members in the mail body.
Final evaluation starts: 9st May 2020
Final ranking will be disclosed in the workshop the 14th of June.

Challenge Repository

The official starting repository for the CVPR 2020 CLVision challenge con *Continual Learning for Computer Vision* can be found here and it contains:

Two scripts to setup the environment and generate the zip submission file.
A complete working example to: 1) load the data and setting up the continual learning protocols; 2) collect all the metadata during training 3) evaluate the trained model on the valid and test sets.
Starting Dockerfile to simplify the final submission at the end of the first phase.

You just have to write your own Continual Learning strategy (even with just a couple lines of code!) and you are ready to partecipate.

News and Update

Q&As, news and competition rules updates will be listed on this page.

In case of any question or doubt you can contact us via email at vincenzo.lomonaco AT unibo, or join the ContinualAI slack workspace at the #clvision-workshop channel to ask your questions and be always updated about the progress of the competition.

References

[Lomonaco, 2017] Vincenzo Lomonaco and Davide Maltoni. "CORe50: a new Dataset and Benchmark for continual Object Recognition". Proceedings of the 1st Annual Conference on Robot Learning, PMLR 78:17-26, 2017.

[Parisi, 2019] Parisi, German I., et al. "Continual lifelong learning with neural networks: A review." Neural Networks (2019).