EcoTTA: Memory-Efficient Continual Test-time Adaptation
via Self-distilled Regularization


Junha Song1,2   Jungsoo Lee1   In So Kweon2   Sungha Choi1

1Qualcomm AI Research   2KAIST

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2023)

[📖 arXiv] [▶️Presentation]
[💻 Code (Thanks, Lele Chang!)]

Summary

🤔 Test-time adaptation (TTA) is a cutting-edge AI capability that allows a deployed model to adapt itself to a new environment during the testing phase.
😮 Our work aims to make TTA applicable and practical in edge devices (e.g., robots or autonomous vehicles) by proposing the following approach.
🤠 We design memory-efficient architecture which minimizes memory usage (i.e., activations) by up to 86% compared to state-of-the-art methods. Moreover, our regularization prevents overfitting by leveraging the knowledge acquired during pre-training, which is distilled from the frozen original model.

Poster 🔗

Abstract

This paper presents a simple yet effective approach that improves continual test-time adaptation (TTA) in a memory-efficient manner. TTA may primarily be conducted on edge devices with limited memory, so reducing memory is crucial but has been overlooked in previous TTA studies. In addition, long-term adaptation often leads to catastrophic forgetting and error accumulation, which hinders applying TTA in real-world deployments. 

Our approach consists of two components to address these issues. 

First, we present lightweight meta networks that can adapt the frozen original networks to the target domain. This novel architecture minimizes memory consumption by decreasing the size of intermediate activations required for backpropagation.

Second, our novel self-distilled regularization controls the output of the meta networks not to deviate significantly from the output of the frozen original networks, thereby preserving well-trained knowledge from the source domain. Without additional memory, this regularization prevents error accumulation and catastrophic forgetting, resulting in stable performance even in long-term test-time adaptation. 

We demonstrate that our simple yet effective strategy outperforms other state-of-the-art methods on various benchmarks for image classification and semantic segmentation tasks. Notably, our proposed method with ResNet-50 and WideResNet-40 takes 86% and 80% less memory than the recent state-of-the-art method, CoTTA.

Approach

1) What is activation?

Existing TTA works update model parameters to adapt to the target domain. This process inevitably requires additional memory to store the activations which refer to the intermediate features stored during the forward propagation, which are used for gradient calculations during backpropagation. Note that only learnable layers, not frozen layers, must store the activations. Based on this observation, we develop a memory-efficient architecture that minimizes the activation size by updating only a few layers. This approach differs from previous methods to require updating a large number of layers.

2) Memory-efficient Architecture & Self-distilled Regularization

Overview of our approach. [🔗 High resolution GIF] (a) The encoder of the pre-trained model is divided into K parts (i.e., model partition factor K). (b) Before deployment, the meta networks are attached to each part of the original networks and pre-trained with source dataset D_s. (c) After the model is deployed, only the meta networks are updated with unsupervised loss (i.e., entropy minimization) on target data D_t, while the original networks are frozen. (c) To avoid error accumulation and catastrophic forgetting by the long-term adaptation, we regularize the output tilde{x}_k of each group of the meta networks leveraging the output x_k of the frozen original network, which preserves the source knowledge.


Experimental Results

1) Classification Experiments

Comparison of error rate (%) on CIFAR-C. We report an average error of 15 corruptions on continual TTA and a memory requirement including model parameters and activation sizes. The lowest error is in bold, and the second lowest error is underlined. The memory reduction rates compared to CoTTA and TENT are presented sequentially. WideResNet-40 was pre-trained with AugMix that is a data processing to increase the robustness of the model. Source denotes the pre-trained model without adaptation. Single domain (in short, single do.) TENT resets the model when adapting to a new target domain, so the domain labels are required. In addition, experiments on ImageNet-C are provided in the main paper (Table2).

2) Segmentation Experiments

Semantic segmentation results in continual test-time adaptation tasks. We conduct experiments on Cityscapes with four weather corruptions applied. The four conditions are repeated ten times to simulate continual domain shifts. All results are evaluated based on DeepLabV3Plus-ResNet-50.

3) Ablation Study 1

Architecture ablation experiments. (a,b) We compare continual TTA performance on several memory-efficient designs. WRN refers to WideResNet backbone. (c) We report the performance based on different designs of partitioning the model. The value next to the backbone’s name denotes the total number of residual blocks of a model.

4) Ablation Study 2

Regularization ablation experiments. We conduct experiments with WideResNet-40 on CIFAR100-C. (a) We utilize a test set of the CIFAR-100 dataset to measure clean error after adapting to each corruption. Maintaining clean errors at a stable level indicates that our approach helps the model robust to catastrophic forgetting. (b) We simulate a long-term adaptation scenario by repeating 100 rounds of 15 corruption sequences. In the absence of regularization, error accumulation can lead to overfitting (i.e., the case of the error increases exponentially). However, our approach does not suffer from such an error accumulation. We set K to 5 in the above experiments.

Cite

@inproceedings{song2023ecotta,

  title={EcoTTA: Memory-Efficient Continual Test-time Adaptation via Self-distilled Regularization},

  author={Junha Song and Jungsoo Lee and In So Kweon and Sungha Choi},

  booktitle={IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},

  year={2023}

}