RLSbench: Domain Adaptation Under Relaxed Label Shift
(To appear at) International Conference of Machine Learning (ICML), 2023
TL;DR -- A large scale study of domain adaptation methods under scenarios where both label distribution and conditionals p(x|y) may shift, highlights brittleness of existing methods and simple fixes that improves the performance.
Abstract
Despite the emergence of principled methods for domain adaptation under label shift, the sensitivity of these methods for minor shifts in the class conditional distributions remains precariously under explored. Meanwhile, popular deep domain adaptation heuristics tend to falter when faced with shifts in label proportions. While several papers attempt to adapt these heuristics to accommodate shifts in label proportions, inconsistencies in evaluation criteria, datasets, and baselines, make it hard to assess the state of the art. In this paper, we introduce RLSbench, a large-scale relaxed label shift benchmark, consisting of >500 distribution shift pairs that draw on 14 datasets across vision, tabular, and language modalities and compose them with varying label proportions. First, we evaluate 13 popular domain adaptation methods, demonstrating more widespread failures under label proportion shifts than were previously known. Next, we develop an effective two-step meta-algorithm that is compatible with most deep domain adaptation heuristics: (i) pseudo-balance the data at each epoch; and (ii) adjust the final classifier with (an estimate of) target label distribution. The meta-algorithm improves existing domain adaptation heuristics often by 2--10% accuracy points under extreme label proportion shifts and has little (i.e., < 0.5\%) effect when label proportions do not shift. We hope that these findings and the availability of RLSbench will encourage researchers to rigorously evaluate proposed methods in relaxed label shift settings.
Motivation and Setup
Most benchmarks for Domain Adaptation (DA) exhibit little or no shift in the label proportion from source to target.
Consequently, benchmark-driven research has produced a variety of heuristic DA methods that despite yielding gains in benchmark performance tend to break when p(y) shifts.
In this paper, we develop RLSbench, a standardized test bed of relaxed label shift settings, where p(y) can shift arbitrarily and the class conditionals p(x|y) can shift in seemingly natural ways.
We develop an effective two-step meta-algorithm that is compatible with most DA heuristics. Our meta-algorithm significantly improves existing DA methods under extreme label proportion shifts.
RLSbench: Relaxed Label Shift Benchmark
A standardized test bed of >500 distribution shift pairs with varying severity of shift in target class proportions across 14 multi-domain datasets.
We evaluate a collection of 12 popular DA methods: (i) Domain invariant learning, e.g., DANN, CDANN, IW-CDANN; (ii) Self-training, e.g., PseudoLabel, FixMatch, NoisyStudent, SENTRY; (iii) Test-time adaptation, e.g., TENT, BN-adapt, CORAL.
RLSbench Datasets
Example of RLSbench settings on CIFAR10
Popular DA Methods Falter When Faced With Shifts in Target Label Proportions
Meta Algorithm Summary
We implement two simple general-purpose corrections: (i) Re-Sampling (RS) and (ii) Re-Weighting (RW).
Performance of DA methods Improves When Paired With our Meta-Algorithm (RS+RW)
We show results with three algorithms on vision modality. Results with other methods and other modalities are in the paper.
Citing
To cite this paper, please use the following reference:
To cite this paper, please use the following reference:
Authors
Siva Balakrishnan
Zack Lipton
For questions, please contact us at: sgarg2@andrew.cmu.edu