FeatMatch: Feature-Based Augmentation for Semi-Supervised Learning

† Georgia Tech, ‡ Virginia Tech

Abstract

Recent state-of-the-art semi-supervised learning (SSL) methods use a combination of image-based transformations and consistency regularization as core components. Such methods, however, are limited to simple transformations such as traditional data augmentation or convex combinations of two images. In this paper, we propose a novel learned feature-based refinement and augmentation method that produces a varied set of complex transformations. Importantly, these transformations also use information from both within-class and across-class prototypical representations that we extract through clustering. We use features already computed across iterations by storing them in a memory bank, obviating the need for significant extra computation. These transformations are then used as part of the consistency-based regularization loss. We demonstrate that our method is comparable to current state of art for smaller datasets (CIFAR-10 and SVHN) while being able to scale up to larger datasets such as CIFAR-100 and mini-Imagenet where we achieve significant gains over the state of art (e.g. absolute 17.44% gain on mini-ImageNet). We further test our method on DomainNet, demonstrating better robustness to out-of-domain unlabeled data, and perform rigorous ablations and analysis to validate the method.

Background

Driven by large-scale datasets such as ImageNet as well as computing resources, deep neural networks have achieved strong performance on a wide variety of tasks. Training these deep neural networks, however, requires millions of labeled examples that are expensive to acquire and annotate. Consequently, numerous methods have been developed for semi-supervised learning (SSL), where a large number of unlabeled examples are available alongside a smaller set of labeled data. One branch [1, 2, 3, 4] of the most successful SSL methods uses image-based augmentation to generate different transformations of an input image, and consistency regularization to enforce invariant representations across these transformations. While these methods have achieved great success, the data augmentation methods for generating different transformations are limited to transformations in the image space and fail to leverage the knowledge of other instances in the dataset for diverse transformations.

Feature-Based Augmentation

Image-based augmentation has been shown to be an effective approach to generate different views of an image for consistency-based SSL methods. However, conventional image-based augmentation has the following two limitations:

  1. Operate in image space, which limits the possible transformations to textural or geometric within images.

  2. Operate within a single instance, which fails to transform data with the knowledge of other instances, either within or outside of the same class.

In this paper, we propose to address these two limitations simultaneously. We proposed a novel method that refines and augments image features in the abstract feature space rather than image space. To efficiently leverage the knowledge of other classes, we condense the information of each class into a small set of prototypes by performing clustering in the feature space. The image features are then refined and augmented through information propagated from prototypes of all classes.

Qualitative Results

To have a more concrete sense of the learned feature-based augmentation (AugF), we show the augmented feature’s nearest image neighbor in the feature space. Some sample results are shown in the figure below, with the comparison to image-based augmentation (AugD) side by side. As shown in the figure, AugF is capable of transforming features in an abstract way, which goes beyond simple textural and geometric transformation as AugD does. For instance, it is able to augment data to different poses and backgrounds, which could be challenging for conventional image-based augmentation methods.

Quantitative Results

Robustness of proposed method to out-of-domain unlabeled data

Comparison between the image-based baseline with our proposed feature-based augmentation method on DomainNet with various ru, the ratio of unlabeled data coming from the shifted domains. For instance, ru = 25% means 25% of the unlabeled data are coming from the shifted domains and 75% are coming from the domain same as the labeled set. Numbers are error rates across 3 runs, meaning the lower the better.

Comparison on CIFAR-100 and mini-imageNet.

Numbers represent error rate in three runs. For fair comparison, we use the same model as other methods: CNN-13 for CIFAR-100 and ResNet-18 for mini-ImageNet.

Comparison on CIFAR-10 and SVHN

Numbers represent error rate across three runs. The results reported in the first block with CNN-13 model are from the original paper. The results reported in the second block with wide ResNet (WRN) are reproduced by MixMatch.

References

[1] Sajjadi, Mehdi, et al. "Regularization with stochastic transformations and perturbations for deep semi-supervised learning." Advances in neural information processing systems. 2016.

[2] Berthelot, David, et al. "Mixmatch: A holistic approach to semi-supervised learning." Advances in Neural Information Processing Systems. 2019.

[3] Berthelot, David, et al. "Remixmatch: Semi-supervised learning with distribution matching and augmentation anchoring." International Conference on Learning Representations. 2019.

[4] Sohn, Kihyuk, et al. "Fixmatch: Simplifying semi-supervised learning with consistency and confidence." arXiv preprint arXiv:2001.07685 (2020).

Resources

GitHub (coming soon)

@inproceedings{kuo2020featmatch,
title={FeatMatch: Feature-Based Augmentationfor Semi-Supervised Learning},
author={Kuo, Chia-Wen and Ma, Chih-Yao and Huang, Jia-Bin and Kira, Zsolt},
booktitle={European Conference on Computer Vision},
year={2020},
organization={Springer}
}