Is user feedback always informative?
Retrieval Latent Defending for Semi-Supervised Domain Adaptation without Source Data
Junha Song1,2 Tae Soo Kim2 Junha Kim2 Gunhee Nam2 Thijs Kooi2 Jaegul Choo1
1KAIST 2Lunit Inc.
Is user feedback always informative?
Retrieval Latent Defending for Semi-Supervised Domain Adaptation without Source Data
Junha Song1,2 Tae Soo Kim2 Junha Kim2 Gunhee Nam2 Thijs Kooi2 Jaegul Choo1
1KAIST 2Lunit Inc.
Research Motive
💁♂️ Have you heard about semi-supervised domain adaptation (SemiSDA)? It's an interesting approach where we adapt the source model to the target environment using both labeled and unlabeled target data.
🙋♀️ That sounds intriguing. But I'm curious, how do we collect labeled data in target devices like smartphones or medical applications?
🙍♂️ One way could be to collect small amounts of feedback from users.
💁♀️ Um... I've been thinking about user feedback actually. You know, from a psychological perspective, users tend to be more likely to provide feedback on data points where the model makes incorrect predictions.✞ Have you considered this angle?
🤦♂️ No. Previous works also hadn't considered that before. That’s an interesting observation! [🔬 conducting some experiments...]
🤷♂️ I've been looking into existing SemiSDA methods, and I've noticed they often yield suboptimal results in the scenario.
Poster 🔗
Introduction
Despite advances in domain adaptation (DA) techniques, adapting the model with user feedback remains an open area for further research, even though practical machine learning (ML) products often allow users to provide feedback in order to further improve the model in the target environment.
Classical SemiSDA works simply assume that users label a random subset of target data. However, as illustrated in the figure below, we suggest that users are more likely to provide feedback on misclassified samples by the source model, named negatively biased feedback (NBF).
We introduce this novel concept (NBF) and investigate its impact on SemiSDA. Our research reveals that: (1) NBF can significantly affect the performance of existing SemiSDA methods, often leading to sub-optimal adaptation results. (2) The impact of NBF on SemiSDA is counterintuitive and more complex than initially anticipated. (3) Specifically, we identify that when NBF is unevenly distributed within the overall data distribution, it results in sub-optimal adaptation performance.
To address this problem, we present a scalable approach named Retrieval Latent Defending (RLD), which can be seamlessly integrated with existing SemiSDA methods. Our approach allows them to adapt the model without a strong dependence on the biasedly distributed labeled data.
NBF: Negatively Biased Feedback
1) What is NBF?
Negatively Biased Feedback (NBF) is based on the observation that user feedback is more likely to be derived from incorrect model predictions. For example, a radiologist might log a misdiagnosed chest X-ray by the model, as its accuracy directly impacts the patient’s survival.
This behavior can be understood from two perspectives: (a) users generally expect their feedback to be used as a basis of model improvement, motivating them to provide NBF, and (b) humans tend to react more strongly to negative experiences, such as receiving incorrect predictions, as observed in previous psychological studies.
2) Influence of NBF on SemiSDA
We conducted a simulation study to understand the effect of NBF on SemiSDA (details are in Figure 3). Our interesting observations are that: (1) user-provided feedback in practice (NBF) has a biased distribution in each class cluster, which is in contrast to random feedback (i.e., a previous SemiSDA setup), (2) Existing SemiSDA methods adapt the model by dominating the labeled data points even though they are biasedly positioned, and (3) NBF prevents the model from having a decision boundary for true class clusters and leads to inferior adaptation performance.
3) Counterintuitive Effect of NBF
Our intuitive reasoning probably suggests that NBF provides more information than RF by correcting more source model deficiencies, and thus leads to better adaptation performance.
Our work highlights the importance of careful design when using user feedback in real-world scenarios and, to the best of our knowledge, is the first study to uncover and analyze this phenomenon.
RLD: Retrieval Latent Defending
Since previous SemiSDA methods have overlooked the unexpected impact of NBF, they often suffer from sub-optimal performance under the NBF assumption. To address this problem, we focus on developing a scalable solution that (i) can easily combine with existing DA methods without modifying their core adapting strategies and (ii) can be applied to a wide range of benchmarks.
Our RLD involves the following steps:
① Prior to each epoch, we generate a candidate bank of data points.
②~④ For each adapting iteration, we balance the mini-batch by retrieving latent defending samples from the bank.
⑤~⑥ The model is then adapted using the reconfigured mini-batch and following the baseline SemiSDA approach.
We hypothesize that the latent space progressively created by the candidates throughout the adaptation process mitigates the issue caused by NBF.
Experimental Results
We evaluate NBF's unexpected influence using various benchmarks, including image classification🐕🐈, medical image diagnosis🩻🏥, and semantic segmentation 🚗🚃. Building upon these evaluations, we demonstrate that our approach not only complements but significantly enhances the performance of multiple SemiSDA methods.
Feedback given by users is modeled as annotations on a small subset of the target’s training set, while the remaining of them are used as unlabeled target data. We take into account two types of feedback: random feedback (RF) and negatively biased feedback (NBF).
While both feedback types bring performance improvement from the source model, lower performance is observed with NBF. Our method enables the baselines to not only address this problem but surpass performance under RF.
Cite
@inproceedings{song2024nbfrld,
title={Is user feedback always informative? Retrieval Latent Defending for Semi-Supervised Domain Adaptation without Source Data},
author={Junha Song and Tae Soo Kim and Junha Kim and Gunhee Nam and Thijs Kooi and Jaegul Choo},
booktitle={The European Conference on Computer Vision (ECCV)},
year={2024}
}