Challenge Results

Summary of the Challenge

0-intro.mp4

Top Performing Teams

1st place: Team xudan7

Description

Our method combines the strengths of Hard Attention to the Task (HAT) and Supervised Contrastive Learning in the context of Class-Incremental with Repetition (CIR). We create a partitioned network using trainable hard masks that are aligned with the experience ID. This allows us to selectively update only the network parameters associated with the current experience during backpropagation, thereby mitigating catastrophic forgetting. We address HAT's limitations with large experience numbers by reinitializing hard attention masks and aligning gradients to stabilize training. Training involves two stages: (1) representation learning with a supervised contrastive loss, and (2) classification learning focusing on current experience classes with normalized logits. Replay embeddings are introduced with an additional logit head representing the out-ofexperience class. During inference, each sample undergoes test-time augmentation and is evaluated across multiple experiences, with each being a potential candidate for all 100 classes. We compute the final prediction by taking a weighted average from these different experiences. Our single model submission achieves an average accuracy of 40.19% on configurations 1, 2 and 3 during pre-selection. In the final phase, we explore complementary approaches using replica sub-models under memory constraints. Specifically, two strategies―one employing specialized models for different experiences, and the other using ensemble models trained on the same experience―have demonstrated promising results in further enhancing model performance.

Team Members

Xiaotian Duan, Zixuan Zhao, Fangfang Xia

Affiliation

University of Chicago

1-xudan7.mp4

2nd place: Team TJU_MLDM

*_Name on the leaderbord: Team linzz

Description

In summary, we propose a parameter isolation method combined with an ensemble learning strategy. To achieve better model integration, we align the classifiers to maintain the same energy level obtained by the model in different training phases. To address the issue of sample efficiency, we use self-supervised learning to capture more general and discriminative representations, thus improving generalization performance. Additionally, we establish a shared prompt pool to facilitate interaction between different tasks and categories, promoting knowledge fusion.

Team Members

Zhilin Zhu, Yan Fan, Huitong Chen, Luona Ji, Xinjie Yao, Junxian Mu, Yu Wang, Pengfei Zhu

Affiliation

Tianjin University

2-linzz.mp4

3rd place: Team mmasana

Description

The core idea is to learn an ensemble of Feature Extractors (FE) on selected experiences, which should provide robust features useful for discriminating downstream classes. We have some heuristics that decide at each experience if we want to learn a new FE for the current classes. We do not consider experiences with less than 5 classes, stop adding FEs to the ensemble after we have seen 85% of the classes, and always train an FE on the first experience. We learn them with both a cross-entropy head, and a contrastive loss with emphasis on the hard negative pairs on a separate head. Both losses are balanced with an adaptive alpha, automatically computed depending on the energy of each loss. After learning from the current experience, the heads are removed and the backbone is frozen and added to the rest of the ensemble. However, since none of the FEs has knowledge of each other, we need to align their representations for the unified classification by using pseudo-feature projection. Therefore, regardless of a FE being trained, we always update the unified head trained on all ensemble representations. To balance the unified head, we draw inspiration from FeTrIL, extending the pseudo-feature projection to estimate also with the standard deviation and with the outputs of all the ensemble. When the mean and standard deviation of some class is not available, we replace them by re-using the current experience representations.

Team Members

Benedikt Tscheschner, Eduardo Veas, Marc Masana

Affiliation

Graz University of Technology

3-mmasana.mp4

4th place: Team pddbend

Description

Our proposed method is a dynamic architecture based on gated networks. We set up independent branches for each experience and control whether the current branch is activated through gating units. During training, the gating unit opens the branch for the current experience, allowing the network to learn, while closing the branches for previous experiences to freeze their parameters. To improve the model's generalization and robustness during training, we use a large number of data augmentation techniques, such as augmix. During testing, the gating unit controls the branches to perform sequential predictions. However, it is important to note that considering all branch predictions simultaneously, as in DER, may not yield good results because most branches may not have seen this class and may make overconfident judgments. This problem can be seen as an open-set recognition problem. To address this issue, we propose a weighted strategy based on entropy, feature norm, and the number of classes. Specifically, for each branch, we calculate the entropy of the predicted probability. If the entropy is high, the sample is likely an open-set sample for the current branch. Similarly, we also calculate the feature norm. If the feature norm is high, the sample is also likely an open-set sample. Finally, we believe that the more classes a experience has, the more reliable the model's judgment will be. Therefore, we also weight based on the number of classes in the experience.

Team Members

YuXiang Zheng, ShiJi Zhao, ShaoYuan Li, ShengJun Huang

Affiliation

Nanjing University of Aeronautics and Astronautics

4-pddbend.mp4