Unsupervised Domain Adaptation via Data Pruning for Enhancing the Robustness of PAM Systems
One of the challenges in developing ML methods for automated PAM systems is their lack of robustness to changes in location, data collection hardware and other environmental factors. The removal of carefully-selected examples from training data has recently emerged as an effective way of improving the robustness of ML models. However, the best way to select these examples remains an open question. In this paper, we consider the problem from the perspective of unsupervised domain adaptation (UDA). We propose a method for UDA whereby training examples are removed to attempt to align the training distribution to that of the target data. By adopting the maximum mean discrepancy (MMD) as the criterion for alignment, the problem can be neatly formulated and solved as an integer quadratic program. On a real-world domain shift problem of bioacoustic event detection, we show that data pruning outperforms related approaches such as importance weighting, and is complementary to other UDA techniques such as CORAL. Our analysis of the relationship between the MMD and model accuracy, along with t-SNE plots, validate the proposed method as a principled and well-founded way of performing data pruning.