Feature Scattering Adversarial Training

Haichao Zhang Jianyu Wang

Advances in Neural Information Processing Systems (NeurIPS) 2019

Abstract

We introduce a feature scattering-based adversarial training approach for improving model robustness against adversarial attacks. Conventional adversarial training approaches leverage a supervised scheme (either targeted or non-targeted) in generating attacks for training, which typically suffer from issues such as label leaking as noted in recent works. Differently, the proposed approach generates adversarial images for training through feature scattering in the latent space, which is unsupervised in nature and avoids label leaking. More importantly, this new approach generates perturbed images in a collaborative fashion, taking the intersample relationships into consideration. We conduct analysis on model robustness and demonstrate the effectiveness of the proposed approach through extensively experiments on different datasets compared with state-of-the-art approaches.

Motivation

It has been pointed out that some clean trained models focus on some discriminative but less robust features, thus are vulnerable to adversarial attacks. Therefore, the conventional supervised attack that tries to move feature points towards this decision boundary is likely to disregard the original data manifold structure. When the decision boundary lies close to the manifold for its out of manifold part, adversarial perturbations lead to a tilting effect on the data manifold; at places where the classification boundary is far from the manifold for its out of manifold part, the adversarial perturbations will move the points towards the decision boundary, effectively shrinking the data manifold. As the adversarial examples reside in a large, contiguous region and a significant portion of the adversarial subspaces is shared, pure label-guided adversarial examples will clutter as least in the shared adversarial subspace. In summary, while these effects encourage the model to focus more around the current decision boundary, they also make the effective data manifold for training deviate from the original one, potentially hindering the performance.

Feature Scattering shift the previous focus on the decision boundary to the inter-sample structure. The proposed approach can be intuitively understood as generating adversarial examples by perturbing the local neighborhood structure in an unsupervised fashion and then performing model training with the generated adversarial images. The overall framework is shown below.

Feature Scattering-based Adversarial Training Pipeline. The adversarial perturbations are generated collectively by feature scattering, i.e., maximizing the feature matching distance between the clean samples and the perturbed samples. The model parameters are updated by minimizing the cross-entropy loss using the perturbed images as the training samples.

Feature Scattering-based Adversarial Training

Feature Matching and Feature Scattering

Definition 1. (Feature Matching Distance) The feature matching distance between two set of images is defined as D(µ, ν), the OT distance between empirical distributions µ and ν for the two sets.

Definition 2. (Feature Scattering) Given a set of clean data, which can be represented as an empirical distribution as µ. The feature scattering procedure is defined as producing a perturbed empirical distribution ν by maximizing D(µ, ν), the feature matching distance between µ and ν, subject to domain and budget constraints.

Illustration Example of Different Perturbation Schemes. (a) Original data. Perturbed data using (b) supervised adversarial generation method and (c) the proposed feature scattering, which is an unsupervised method. The overlaid boundary is from the model trained on clean data.

Remark. As the feature scattering is performed on a batch of samples leveraging inter-sample structure, it is more effective as adversarial attacks compared to structure-agnostic random perturbation while is less constrained than supervisedly generated perturbations which is decision boundary oriented and suffers from label leaking.

Feature Scattering Adversarial Training

We leverage feature scattering for adversarial training, with the mathematical formulation as follows

Results

White-box Attacks

Results on CIFAR10

Model performance under PGD attack with different (a) attack budgets (b-c) attack iterations. Madry and Proposed models are trained with the attack iteration of 7 and 1 respectively.

Loss Surface

Loss surface visualization in the vicinity of a natural image along adversarial direction (d_a) and direction of a Rademacher vector (d_r) for (a) Standard (b) Madry (c) Proposed models.

Black-box Attack

Black-box attack results on CIFAR10. Together with white-box results, it has been demonstrated that the improved robustness is indeed due to model improvement instead of gradient masking.

Related Publications and Resources

Haichao Zhang and Jianyu Wang, Defense Against Adversarial Attacks Using Feature Scattering-based Adversarial Training NeurIPS 2019

@inproceedings{feature_scatter,

author = {Haichao Zhang and Jianyu Wang},

title = {Defense Against Adversarial Attacks Using Feature Scattering-based Adversarial Training},

booktitle = {Advances in Neural Information Processing Systems},

year = {2019}

}

Page updated

Google Sites

Report abuse