To answer RQ4, we split the datasets MNIST, FMNIST, CIFAR10, and ImageNet into two subtasks, each with five categories, labeled as task A and task B. For instance, in the CIFAR10 dataset, which contains ten distinct image categories, task A consists of the first five categories, while task B consists of the remaining five categories. To simulate the process of incremental model development, we start by training models exclusively on task A in six different experimental setups following the division. Then, we proceed to train these models on task B while aiming to preserve their performance on task A. The evaluation focuses on the accuracy for both task A and task B. Here, the accuracy on task A evaluates how well the models retain knowledge of the original task without forgetting, while the accuracy on taskB gauges the models' ability to learn new tasks.
In particular, we aim to compare the following seven methods: naive retraining (i.e., directly training with the data of task B), Replay 5% (i.e., the training data includes 5% of task A data with all data of task B), Replay 10%, Replay 100%, DeepArc (i.e., training task B only on the redundant layers of the model), NeuSemSlice, and NeuSemSlice combined with Replay.
For the implementation of NeuSemSlice, we maintain a neuron mask. This mask determines the selection of neurons. When conducting task A, we apply the mask, allowing us to use only the frozen parameters related to task A for inference, thereby facilitating accurate predictions while suppressing the output for task B. Conversely, for task B, we remove the mask to utilize the trained redundant parameters.