CVML - GridMix: Strong regularization through local context mapping

Regularization for Convolutional Neural Networks

GridMix: Strong regularization through local context mapping

People (*: equal contribution)

Kyungjune Baek1*, Duhyeon Bang2*, Hyunjung Shim1

1 School of Integrated Technology, Yonsei University 2 Vision AI Labs, T3K, SK telecom

Abstract

Recently developed regularization techniques improve the networks generalization by only considering the global context. Therefore, the network tends to focus on a few most discriminative subregions of an image for prediction accuracy, leading the network being sensitive to unseen or noisy data. To address this disadvantage, we introduce the concept of local context mapping by predicting patch-level labels and combine it with a method of local data augmentation by grid-based mixing, called GridMix. Through our analysis of intermediate representations, we show that our GridMix can effectively regularize the network model. Finally, our evaluation results indicate that GridMix outperforms state-of-the-art techniques in classification and adversarial robustness, and it achieves a comparable performance in weakly supervised object localization.

Overview

Overview of GridMix. Training procedure of GridMix. ̃ x represents a mixed image generated from A and B , ̃ y represents a mixed label, y pi represents a patch label at the i th cell, and f ( ·) indicates our network model. We define the global context loss to model f g and the local context loss to model f l , where the two functions share all the layers before the last, classification, layer. f l ( ̃ x ) i outputs the predicted class label of the i th cell. Each predicted class label is used to calculate the cross-entropy loss with y

Experimental Results

Classification accuracy on CIFAR100 and Tiny ImageNet. Comparisons of the baseline model, MixUp, CutMix, and GridMix on the image classification task. The number following the network name indicates the number of layers (i.e., the depth). In the case of WRN, the two numbers represent the depth and the width, respectively. The notations “w. and “wo. are the abbreviations of with and without, respectively. The bold text indicates the best performance in comparison with the competitors; this indication is used throughout the paper.

Classification accuracy on ImageNet. Evaluating the scalability on a large-scale dataset. The table compares clas- sification performance of the baseline model, MixUp, CutMix, and GridMix on the ILSVRC2012 dataset. The bold text indicates the best performance in comparison with the competitors; this indication is used throughout the paper.

Ablation study on hyperparameters. Ablation study for two hyperparameters on the image-classification task. We changed the size of the grid (i.e., N ×N ) and the parameter of the Bernoulli dis- tribution for generating the mixed image. In this experiment, VGG19 was utilized as the backbone network. Acc. indicates the classification accuracy. The bold text indicates the best performance in comparison with the competitors; this indication is used throughout the paper.

Publication

GridMix: Strong Regularization Through Local Context Mapping

K Baek*, D Bang*, H Shim , Pattern Recognition, 2021 (To be appeared)

Links

[pdf]

Google Sites

Report abuse