Domain Adaptation

Unsupervised Domain Adaptation

Distribution-based: aim to learn features invariant to domain with a distance metric: KL-divergence; MMD; Wassertein distance; But they neglect the alignment of conditional distribution

  1. Baktashmotlagh, Mahsa, et al. "Domain adaptation on the statistical manifold." Proceedings of the IEEE conference on computer vision and pattern recognition. 2014.

    • Existing work: (1) sample reweighting: assign weights to the source samples and optimize those weights so as to minimize a distance measure between the re-weighted source and target distributions. (2) subspace-based techniques: try to find a linear transformation or projection of the source data, such that a distance measure between the transformed source and target distributions is minimized.

    • MMD: measures the dissimilarity between two distributions as their maximum difference in expectation over a set of functions.

    • MMD is a simple yet powerful non-parametric criterion that compares the distributions of two sets of data by mapping them to reproducing Kernel Hilbert Space (RKHS)

    • Although the MMD is endowed with nice properties, according to [16], the choice of kernel and kernel parameters is critical when using it as a test statistic. Non-optimal choices can lead to very poor estimates of the distance between two distributions [16]. Furthermore, it does not truly consider the geometry of the space of probability distributions. From information geometry, we know that probability distributions lie on a Riemannian manifold known as the statistical manifold

    • we propose to make use of the Hellinger distance, which is closely related to the Fisher-Rao metric

  2. K. Saito, Y. Ushiku, and T. Harada. Asymmetric tri-training for unsupervised domain adaptation. In ICML, 2017. 2

    • The marginal distribution and conditional distribution can also be jointly aligned with a combined MMD

  3. O. Sener, H. O. Song, A. Saxena, and S. Savarese. Learning transferrable representations for unsupervised domain adaptation. In Advances in Neural Information Processing Systems, pages 2110–2118, 2016

    • discriminative property of representation

    • considering the categorical semantic compactness and separability

Task-oriented learning: tend to align the domain discrepency in an adversarial style.

    1. Y. Ganin and V. Lempitsky. Unsupervised domain adaptation by backpropagation. In ICML, 2015 : proposed to learn domain-invariant feature by using an adversarial loss which reverses the gradients during the back-propagation

    2. Tzeng, Eric, et al. "Adversarial discriminative domain adaptation." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017.

    3. K. Bousmalis, N. Silberman, D. Dohan, D. Erhan, and D. Krishnan. Unsupervised pixel-level domain adaptation with generative adversarial networks. In CVPR, 2017 : enabled the network to separate the generated features into domain-specific subspace and domain-invariant subspace

Graph-based Alignment:

  1. Ma, Xinhong, Tianzhu Zhang, and Changsheng Xu. "GCAN: Graph Convolutional Adversarial Network for Unsupervised Domain Adaptation." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019.

    • Structure-aware Alignment: construct a densely-connected instance graph. Each node corresponds to CNN features of a sample

    • Domain Alignment: adversarial similarity loss

    • Class Centroid Alignment: moving centroids

3D Domain Adaptation

Cross-modal:

  1. Jaritz, Maximilian, et al. "xMUDA: Cross-Modal Unsupervised Domain Adaptation for 3D Semantic Segmentation." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020.

Single-modal:

  1. Wang, Yan, et al. "Train in germany, test in the usa: Making 3d object detectors generalize." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020.

    1. normalize the object size of the source domain leveraging the object statistics of the target domain to close the size-level domain gap

    2. Though the performance has been improved, the method needs the target statistics information, and its effectiveness depends on the source and target data distributions.

  2. Baek, Seungryul, Kwang In Kim, and Tae-Kyun Kim. "Weakly-Supervised Domain Adaptation via GAN and Mesh Model for Estimating 3D Hand Poses Interacting Objects." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020.

  3. Qin, Can, et al. "PointDAN: A multi-scale 3D domain adaption network for point cloud representation." Advances in Neural Information Processing Systems. 2019.

    1. DA for 3d object classification

    2. Local alignment: self-adaptive; Global alignment: adversarial

    3. benchmark: PointDA-10

  4. Zhou, Xingyi, et al. "Unsupervised domain adaptation for 3d keypoint estimation via view consistency." Proceedings of the European Conference on Computer Vision (ECCV). 2018.

DA for 3D Object Detection

  1. Wang, Yan, et al. "Train in germany, test in the usa: Making 3d object detectors generalize." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020.

    • major bias: statistical differences in the size and shapes of cars

    • solution: leverage aggregate statistics to correct the bias

    • poor generalization resides primarily in localization: inaccurate box size

    • Either by few-shot fine-tuning or statistical normalization (suppose target cars specs are available)


  1. Yang, Jihan, et al. "ST3D: Self-training for Unsupervised Domain Adaptation on 3D Object Detection." arXiv preprint arXiv:2103.05346 (2021). CVPR 2021

  2. Du, Liang, et al. "Associate-3ddet: perceptual-to-conceptual association for 3d point cloud object detection." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020.

  3. Saltori, Cristiano, et al. "SF-UDA $^{3D} $: Source-Free Unsupervised Domain Adaptation for LiDAR-Based 3D Object Detection." arXiv preprint arXiv:2010.08243 (2020).

  4. Achituve, Idan, Haggai Maron, and Gal Chechik. "Self-Supervised Learning for Domain Adaptation on Point Clouds." Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2021.

Open-set Domain Adaptation

  1. Main Challenges: (1)unknown separation[openness] (2)distribution difference

  1. Panareda Busto, Pau, and Juergen Gall. "Open set domain adaptation." Proceedings of the IEEE International Conference on Computer Vision. 2017.

      • Assign-and-Transform-Iteratively: recognize target samples by using a constraint integer programming; learns a linear mapping to match source and target domain; excluding predicted unknown target samples.

      • Assumption: source domain contains unknown samples

  2. Saito, Kuniaki, et al. "Open set domain adaptation by backpropagation." Proceedings of the European Conference on Computer Vision (ECCV). 2018.

      • rejects unknown target samples by training a binary classifier

      • new setting: where unknown samples only exist in the target domain

      • regarded the unknown samples as a separate class together with an adversarial loss to distinguish them.

      • It is worth noting that the existence of unknown samples hinders the alignment across domain; In the meanwhile, the disalignment inter-class across domain also makes it harder to distinguish the unknown samples. 【comments from paper #5】

  3. Baktashmotlagh, Mahsa, et al. "Learning factorized representations for open-set domain adaptation." ICLR (2018).

      • Known uninteresting setting

      • Extract the principal component from known and unknown classes

      • add sparse constraint on the response and use it to ||T^v||/||T^u|| <= threshold to identify known

      • separated the unknown according to whether the sample can be reconstructed with the shared feature or not

  4. Liu, Hong, et al. "Separate to Adapt: Open Set Domain Adaptation via Progressive Separation." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019.

      • Task: Open-set Domain Adaptation

      • Challenges: (1) how to mitigate domain shift; (2) negative transfer

      • How to tackle the unseen classes is the key

  5. Feng, Qianyu, et al. "Attract or distract: Exploit the margin of open set." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019.

    • starts with a mild alignment with a procedure similar to OSBP and refines the decision by using metric learning to reduce the intra-class dis- tance in known classes and push the unknown class away from the known classes

  6. Tan, Shuhan, Jiening Jiao, and Wei-Shi Zheng. "Weakly supervised open-set domain adaptation by dual-domain collaboration." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019.

    • Both source and target samples have the labeled and unlabeled samples

    • perform pseudo-labeling for both domains

  7. Shermin, Tasfia, et al. "Adversarial Network with Multiple Classifiers for Open Set Domain Adaptation." IEEE Transactions on Multimedia (2020).

  8. Yingwei Pan et al. "Exploring Category-Agnostic Clusters for Open-Set Domain Adaptation" , CVPR 2020

    • (1) semantic categorical alignment: good separability of target known classes by aligning them with centroids of source target.

    • (2) semantic contrastive mapping: pushing unknown classes away from the decision boundary.

  9. Luo, Yadan, et al. "Progressive graph learning for open-set domain adaptation." International Conference on Machine Learning (ICML). PMLR, 2020.

  10. Bucci, Silvia, Mohammad Reza Loghmani, and Tatiana Tommasi. "On the Effectiveness of Image Rotation for Open Set Domain Adaptation." European Conference on Computer Vision. Springer, Cham, 2020.

  11. Kundu, Jogendra Nath, et al. "Towards inheritable models for open-set domain adaptation." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020.

  12. Kishida, Ikki, et al. "Object Recognition With Continual Open Set Domain Adaptation for Home Robot." Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2021.

  13. Bucci, Silvia, Mohammad Reza Loghmani, and Tatiana Tommasi. "On the Effectiveness of Image Rotation for Open Set Domain Adaptation." European Conference on Computer Vision. Springer, Cham, 2020.

  14. Fang, Zhen, et al. "Open set domain adaptation: Theoretical bound and algorithm." IEEE Transactions on Neural Networks and Learning Systems (2020).

  15. Loghmani, Mohammad Reza, Markus Vincze, and Tatiana Tommasi. "Positive-unlabeled learning for open set domain adaptation." Pattern Recognition Letters 136 (2020): 198-204.

  16. Jing, Mengmeng, et al. "Balanced Open Set Domain Adaptation via Centroid Alignment." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 35. No. 9. 2021.

    • propose a balanced OSDA method based on centroid alignment in a hyperspherical latent space

    • use the source centroids as the anchor centroids

    • bound the centroid deviation angle -> reduce intra-class variations and increase inter-class gap (ranking loss, angular distance)-> close domain gap

    • Extreme Value Theory (EVT) to recognize the unknown samples misclassified into known classes

    • Feature Encoder: s-VAE: take the source representations zs and the target representation zt as the prior of each other to enhance the domain invariance of the representations (not source-free)

Universal Domain Adaptation

  • You, Kaichao, et al. "Universal domain adaptation." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019.

  • Saito, Kuniaki, et al. "Universal domain adaptation through self supervision." arXiv preprint arXiv:2002.07953 (2020).

  • Kundu, Jogendra Nath, Naveen Venkat, and R. Venkatesh Babu. "Universal source-free domain adaptation." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020.

  • Fu, Bo, et al. "Learning to Detect Open Classes for Universal Domain Adaptation." European Conference on Computer Vision. Springer, Cham, 2020.

  • Lifshitz, Omri, and Lior Wolf. "A sample selection approach for universal domain adaptation." arXiv preprint arXiv:2001.05071 (2020).

Theoretical Analysis

  1. Chen, Xinyang, et al. "Transferability vs. discriminability: Batch spectral penalization for adversarial domain adaptation." International Conference on Machine Learning. 2019.

  2. Fang, Zhen, et al. "Open Set Domain Adaptation: Theoretical Bound and Algorithm." arXiv preprint arXiv:1907.08375 (2019).

  3. Zhao, Han, et al. "On learning invariant representation for domain adaptation." ICML 2019 [Paper]

  4. Zhang, Yuchen, et al. "Bridging Theory and Algorithm for Domain Adaptation." ICML 2019. [Paper] [Code]

  5. (label shift) Azizzadenesheli, Kamyar, et al. "Regularized learning for domain adaptation under label shifts." arXiv preprint arXiv:1903.09734 (2019). ICML 2019

  6. (covariate shift) You, Kaichao, et al. "Towards Accurate Model Selection in Deep Unsupervised Domain Adaptation." International Conference on Machine Learning. 2019.

    • The formulation of domain adaptation mainly fall into two categories: covariate shift and label shift

  7. Zhang, Dexuan, and Tatsuya Harada. "A General Upper Bound for Unsupervised Domain Adaptation." arXiv preprint arXiv:1910.01409 (2019). [Paper]

  8. Hoffman, Judy, Mehryar Mohri, and Ningshan Zhang. "Algorithms and theory for multiple-source adaptation." Advances in Neural Information Processing Systems. 2018.

  9. (label shift) Lipton, Zachary C., Yu-Xiang Wang, and Alex Smola. "Detecting and correcting for label shift with black box predictors." arXiv preprint arXiv:1802.03916 (2018). ICML

  10. (covariate shift) Ganin, Yaroslav, et al. "Domain-adversarial training of neural networks." Domain Adaptation in Computer Vision Applications. Springer, Cham, 2017. 189-209.

  11. Gong, Mingming, et al. "Domain adaptation with conditional transferable components." International conference on machine learning. 2016.

    • certain assumptions must be imposed in how the distribution changes across domains.

    • covariate shift: only marginal distribution differs: importance reweighting if source domain is richer than target domain

    • the weights are defined as the density ratio between source and target domain

  12. (covariate shift) Long, Mingsheng, et al. "Learning transferable features with deep adaptation networks." arXiv preprint arXiv:1502.02791 (2015).

  13. Zhang, Chao, Lei Zhang, and Jieping Ye. "Generalization bounds for domain adaptation." Advances in neural information processing systems. 2012.

  14. Mohri, Mehryar, and Andres Munoz Medina. "New analysis and algorithm for learning with drifting distributions." International Conference on Algorithmic Learning Theory. Springer, Berlin, Heidelberg, 2012.

  15. (covariate shift) Yu, Yaoliang, and Csaba Szepesvári. "Analysis of kernel mean matching under covariate shift." arXiv preprint arXiv:1206.4650 (2012).

  16. Ben-David, Shai, et al. "A theory of learning from different domains." Machine learning 79.1-2 (2010): 151-175.

  17. (covariate shift) Cortes, Corinna, Yishay Mansour, and Mehryar Mohri. "Learning bounds for importance weighting." Advances in neural information processing systems. 2010.

  18. Mansour, Yishay, Mehryar Mohri, and Afshin Rostamizadeh. "Domain adaptation with multiple sources." Advances in neural information processing systems. 2009.

    • analysis deals with the related but distinct case of adaptation with multiple sources, and where the target is a mixture of the source distribution

  19. Blitzer, John, et al. "Learning bounds for domain adaptation." Advances in neural information processing systems. 2008.

    • gave a bound on the error rate of a hypothesis derived from a weighted combination of the source data sets for the specific case of empirical risk minimization

  20. Ben-David, Shai, et al. "Analysis of representations for domain adaptation." Advances in neural information processing systems. 2007.

    • The first theoretical analysis of the domain adaptation problem; who gave VC-dimension-based generalisation bounds for adaptation in classification tasks.

    • The most significant contribution was the definition and application of a distance between distributions, the d_A distance

    • d_a distance can be estimated from finite samples for a finite VC dimension

Pseudo Labelling for DA

  1. Saito, Kuniaki, Yoshitaka Ushiku, and Tatsuya Harada. "Asymmetric tri-training for unsupervised domain adaptation." Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 2017.

  2. Sener, Ozan, et al. "Learning transferrable representations for unsupervised domain adaptation." Advances in Neural Information Processing Systems. 2016.

  3. Weinshall, Daphna, Gad Cohen, and Dan Amir. "Curriculum learning by transfer learning: Theory and experiments with deep networks." arXiv preprint arXiv:1802.03796 (2018).

  4. Chen, Chaoqi, et al. "Progressive Feature Alignment for Unsupervised Domain Adaptation." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019. [Paper] [Code]

  5. Liang, Jian, et al. "Exploring uncertainty in pseudo-label guided unsupervised domain adaptation." Pattern Recognition 96 (2019): 106996.

Video Domain Adaptation

  • Tang, Jun, et al. "Cross-domain action recognition via collective matrix factorization with graph Laplacian regularization." Image and Vision Computing 55 (2016): 119-126.

    • learns a projection matrix for each domain to map all features into a common latent space

  • Xu, Tiantian, et al. "Dual many-to-one-encoder-based transfer learning for cross-dataset human action recognition." Image and Vision Computing 55 (2016): 127-137.

  • Jamal, Arshad, et al. "Deep Domain Adaptation in Action Space." BMVC. 2018.

    • the action videos in the target domain are modeled as a sequence of points on a latent subspace and adaptive kernels are successively learned between the source domain point and the sequence of target domain points on the manifold

  • Liu, An-An, et al. "Multi-domain and multi-task learning for human action recognition." IEEE Transactions on Image Processing 28.2 (2018): 853-867. (multi-view)

    • 1) extract domain-invariant information for multi-view and multi-modal action representation and 2) explore the correlations among multiple action categories.

    • In particular, the framework learns a group of co-embedding matrixes from various domains and forces multi-domain instances of the same action to have similar embedded representations.

  • Zhang, Xiao-Yu, et al. "Learning transferable self-attentive representations for action recognition in untrimmed videos with weak supervision." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 33. 2019.

    • transfer knowledge for action localization using MMD, yet only aligns video-level features.

  • Chen, Min-Hung, et al. "Temporal attentive alignment for large-scale video domain adaptation." Proceedings of the IEEE International Conference on Computer Vision. 2019.

    • Proposes to attentively adapt segments that contribute the most to the overall domain shift by leveraging the entropy of domain label predictor.

  • Pan, Boxiao, et al. "Adversarial Cross-Domain Action Recognition with Co-Attention." arXiv preprint arXiv:1912.10405 (2019). AAAI 2020

    • co-attention module

Blending Target Domain Adaptation

  • Peng, Xingchao, et al. "Domain agnostic learning with disentangled representations." arXiv preprint arXiv:1904.12347 (2019). [ICML2019] [Pytorch]

    • disentangle features into three parts: (1) domain-invariant (2) domain-specific (3) class-irrelevant

    • idea is great but kind of ensemble; the class-irrelevant (variances) should be taken into further consideration

    • core idea: (1) single out the variance and discard (2) use MINE (mutual information) to constrain

  • Chen, Ziliang, et al. "Blending-target domain adaptation by adversarial meta-adaptation networks." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019. [CVPR2019] [Pytorch]

    • leverage deep clustering to find potential "style cluster" in target

    • use adversarial learning to minimize the style gap (variance)

    • do normal adversarial domain adaptation

    • cons: how to guarantee the learned cluster for style rather than class (maybe it is better to find safe part for domain adaptation)

Multi-source DA

  • Ahmed, Sk Miraj, et al. "Unsupervised Multi-source Domain Adaptation Without Access to Source Data." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021.

  • Multi-Source Open-Set Deep Adversarial Domain Adaptation [ECCV2020]

  • Online Meta-Learning for Multi-Source and Semi-Supervised Domain Adaptation [ECCV2020]

  • Curriculum Manager for Source Selection in Multi-Source Domain Adaptation [ECCV2020]

  • Learning to Combine: Knowledge Aggregation for Multi-Source Domain Adaptation [ECCV2020] [Pytorch]

  • Multi-Source Domain Adaptation for Text Classification via DistanceNet-Bandits [AAAI2020]

  • Roy, Subhankar, et al. "TriGAN: Image-to-Image Translation for Multi-Source Domain Adaptation." arXiv preprint arXiv:2004.08769 (2020). [arxiv]

  • Zhao, Sicheng, et al. "Multi-source Domain Adaptation in the Deep Learning Era: A Systematic Survey." arXiv preprint arXiv:2002.12169 (2020).

  • Liang, Jian, et al. "Distant supervised centroid shift: A simple and efficient approach to visual domain adaptation." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019.

  • Peng, Xingchao, et al. "Moment matching for multi-source domain adaptation." Proceedings of the IEEE International Conference on Computer Vision. 2019. [ICCV2019] [Pytorch]

    • MMD for multiple domains (moment-related distance)

    • fixed G -> maximize discrepancy of C1 and C2 (get two independent experts)

    • fixed C1, C2 -> minimize discrepancy (align features from different angles)

    • adaptation was performed by aligning the moments of feature distributions between each source-target pair

  • MLA Schoenauer-Sebag, Alice, et al. "Multi-Domain Adversarial Learning" [ICLR2019] [Torch]

    • Multi-domain learning (MDL) aims to learn a model of minimal risk from datasets drawn from distinct underlying distributions (Dredze et al., 2010), and is a particular case of transfer learning

    • Benefits:

      • leverages more (labeled and unlabeled) information, allowing better generalization while accommodating the specifics of each domain

      • MDL models have a higher chance of ab initio performing well on a new domain − a problem referred to as domain generalization (Muandet et al., 2013) or zero-shot domain adaptation (Yang & Hospedales, 2015)

      • Learning a single model from samples drawn from n distributions raises the question of available learning guarantees regarding the model error on each distribution

      • MULANN handles the so-called class asymmetry issue (when each domain may contain varying numbers of labeled and unlabeled examples of a subset of all possible classes)

      • **For instance, if a domain has unlabeled samples from a class which is not present in the other domains, both global (Ganin et al., 2016) and class-wise (Pei et al., 2018) domain alignments will likely deteriorate at least one of the domain risks by putting the unlabeled samples close to labeled ones from the same domain.

  • Wang, Haotian, et al. "TMDA: Task-Specific Multi-source Domain Adaptation via Clustering Embedded Adversarial Training." 2019 IEEE International Conference on Data Mining (ICDM). IEEE, 2019.

  • Hoffman, Judy, Mehryar Mohri, and Ningshan Zhang. "Algorithms and theory for multiple-source adaptation." Advances in Neural Information Processing Systems. 2018. [NIPS2018]

    • derived DC-programming and calculated more accurate combination weights

    • propose normalized solutions with theoretical guarantees for cross-entropy loss, aiming to provide a solution for the MSDA problem with very practical benefit

    • tighter generalization bound and more accurate measurements

  • Zhao, Han, et al. "Adversarial multiple source domain adaptation." Advances in neural information processing systems. 2018. [NIPS2018] [Pytorch]

    • extended the generalization bound of seminal theoretical model to multiple source under both classification and regression settings.

  • Li, Yitong, and David E. Carlson. "Extracting relationships by multi-domain matching." Advances in Neural Information Processing Systems. 2018.

    • considered the relationship between pairwise sources and derived a tighter bound on weighted multi-source discrepancy

    • more relevant source domains can be picked out

  • Mancini, Massimiliano, et al. "Boosting domain adaptation by discovering latent domains." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018. [CVPR2018] [Caffe] [Pytorch]

  • Xu, Ruijia, et al. "Deep cocktail network: Multi-source unsupervised domain adaptation with category shift." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018. [CVPR2018] [Pytorch]

    • proposes a k-way domain adversarial classifier and category classifier to generate a combined representation for the target

  • Mansour, Yishay, Mehryar Mohri, and Afshin Rostamizadeh. "Domain adaptation with multiple sources." Advances in neural information processing systems. 2009.

    • claim that the target hypothesis can be represented by a weighted combination of source hypotheses

    • assume that the target distribution can be approximated by a mixture of the M source distributions -> weighted combination of source classifier

Continual Domain Adaptation

Su, Peng, et al. "Gradient Regularized Contrastive Learning for Continual Domain Adaptation." arXiv preprint arXiv:2007.12942 (2020). [pdf]

Source-Free Domain Adaptation

  • Kundu, Jogendra Nath, et al. "Balancing discriminability and transferability for source-free domain adaptation." International Conference on Machine Learning. PMLR, 2022.

  • Lee, Jonghyun, et al. "Confidence Score for Source-Free Unsupervised Domain Adaptation." International Conference on Machine Learning. PMLR, 2022.

    • PPL (confidence scores) + LPG (confidence gap) + mixup

  • Ahmed, Sk Miraj, et al. "Unsupervised Multi-source Domain Adaptation Without Access to Source Data." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021.

    • entails adaptation from a bag of source models - each of these source domains are correlated to the target by different amounts and adaptation involves not only incorporating the combined prior knowledge from multiple models, but simultaneously preventing the possibility of negative transfer.

    • totally similar to shot-plus, adding weighted mechanism for source data

  • Ishii, Masato, and Masashi Sugiyama. "Source-free Domain Adaptation via Distributional Alignment by Matching Batch Normalization Statistics." arXiv preprint arXiv:2101.10842 (2021).

    • Align batch statistics

    • Information maximization

  • Hou, Yunzhong, and Liang Zheng. "Visualizing Adapted Knowledge in Domain Transfer." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021.

    • adopted cycleGAN to generate fake source samples s.t. the constraint that yielding same prediction from pre-trained model S and T.

    • https://github.com/hou-yz/DA_visualization

    • use style loss to close the domain gap

  • Kurmi, Vinod K., Venkatesh K. Subramanian, and Vinay P. Namboodiri. "Domain Impression: A Source Data Free Domain Adaptation Method." Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2021.

    • learn the joint distribution of data by using the energy-based modeling of the trained classifier

    • It follows the previous CVPR work that learns to generate target samples with adversarial learning

    • The only difference lies in maximizing the log likelihood borrowed from https://openreview.net/pdf?id=Hkxzx0NtDB

  • Li, Rui, et al. "Model adaptation: Unsupervised domain adaptation without source data." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020.

    • Collaborative Class Conditional Generative Adversarial Networks (3C-GAN) for producing target-style training samples.

    • A class conditional generator + weight regularization + clustering based regularization (decision boundaries in low-density regions)

    • (1) fix C, sample noise and y, generate fake target, can be well classified, and cannot be determined by discriminator

    • (2) use the generated images and target images, to train C. two regularizers added, (a) prevent drifting away ||C-C_old|| (2) conditional entropy loss for all targets,

    • Very nice paper

    • comments from other papers: proposed joint training of the target model and the conditional GAN that is to generate annotated target data.

  • Kundu, Jogendra Nath, Naveen Venkat, and R. Venkatesh Babu. "Universal source-free domain adaptation." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020.

    • Stimulate source negative to generalize in target

  • Kundu, Jogendra Nath, et al. "Towards inheritable models for open-set domain adaptation." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020.

    • Low source-density region of source -> OOD

    • 1. Generate OOD samples -> mitigate overconfidence:

      • feature splicing -> replacing the top-d percentile activation u_n = φd(u^ci_s , u^cj _s)

      • -> K-means clustering -> negative class label

    • 2. Quantify inheritability: mean max confidence of target / mean max confidence of source

    • 3. Adaptation: (1) select top-k most confident samples for pseudo labeling (2) entropy minimization: soft entropy loss for separating shared and unk classes. (3) weighted entropy loss

    • Comments from others: adopted a similar architecture but it has three modules: a backbone model, a feature extractor, and a classifier. In the adaptation phase, only the feature extractor is tuned for the target domain by minimizing the entropy of the classifier’s output

  • Liang, Jian, Dapeng Hu, and Jiashi Feng. "Do we really need to access the source data? source hypothesis transfer for unsupervised domain adaptation." International Conference on Machine Learning. PMLR, 2020.

    • Source HypOthesis Transfer (SHOT), explores to retain prior knowledge in a form of hypothesis instead of training data inherited from previous tasks

    • Fit target representation via information maximization, individually certain and globally diverse

    • Two self-supervised learning strategies are applied to correct misassigned pseudo labels

      • prototype classifier

      • relative rotation prediction

    • Algorithm: (1) pre-train feature extractor + classifier with source data; (2) train a target-specific feature encoding module with self-supervised learning and semi-supervised learning (3) semi-supervised learning, pseudo labeling

    • **Trick: label smoothing training for source model training

    • comments from other work: explicitly divided the pretrained model into two modules, called a feature encoder and a classifier, and trained the target-specific feature encoder while fixing the classifier. To make the classifier work well with the target features, this training jointly conducts both information maximization and self-supervised pseudo-labeling with the fixed classifier

Model Selection

  • You, Kaichao, et al. "Towards accurate model selection in deep unsupervised domain adaptation." International Conference on Machine Learning. PMLR, 2019.

  • Saito, Kuniaki, et al. "Tune it the right way: Unsupervised validation of domain adaptation via soft neighborhood density." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021.

Label Shift?

  • A Unified View of Label Shift Estimation

  • Domain Adaptation with Conditional Distribution Matching and Generalized Label Shift

Test-time Adaptation

  • Wang, Qin, et al. "Continual test-time domain adaptation." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022.

    • reduce the error accumulation by using weight-averaged and augmentationaveraged predictions which are often more accurate

    • to avoid catastrophic forgetting, we propose to stochastically restore a small part of the neurons to the source pre-trained weights during each iteration to help preserve source knowledge in the long-term

(EATA) Niu, Shuaicheng, et al. "Efficient Test-Time Model Adaptation without Forgetting." ICML (2022).

  • sample-efficient optimization strategy and a weight regularizer

  • an anti-forgetting regularizer to enforce the important weights of the model do not change a lot during the adaptation

  • calculate the weight importance based on Fisher information (Kirkpatrick et al., 2017) via a small set of test samples

(Tent) Wang, Dequan, et al. "Tent: Fully test-time adaptation by entropy minimization." ICLR (2021).

  • test entropy minimization (tent1 ): we optimize the model for confidence as measured by the entropy of its predictions

  • estimates normalization statistics and optimizes channel-wise affine transformations to update online on each batch

MT3

(T3A) Iwasawa, Yusuke, and Yutaka Matsuo. "Test-time classifier adjustment module for model-agnostic domain generalization." Advances in Neural Information Processing Systems 34 (2021): 2427-2440.

  • adjusts a trained linear classifier (the last layer of deep neural networks) with the following procedure:

    • (1) compute a pseudo-prototype representation for each class using online unlabeled data augmented by the base classifier trained in the source domains,

    • (2) and then classify each sample based on its distance to the pseudo-prototypes.

  • backprop-free

TTT+

(TTT) Sun, Yu, et al. "Test-time training with self-supervision for generalization under distribution shifts." International conference on machine learning. PMLR, 2020.

  • first finetunes the model via rotation classification and then makes a prediction using the updated model.