RoCL

Adversarial Self-Supervised Contrastive Learning

Korea Advanced Institute of Science and Technology (KAIST)

Existing adversarial learning approaches mostly use class labels to generate adversarial samples that lead to incorrect predictions, which are then used to augment the training of the model for improved robustness. While some recent works propose semi-supervised adversarial learning methods that utilize unlabeled data, they still require class labels.

However, do we really need class labels at all, for adversarially robust training of deep neural networks?

In this paper, we propose a novel adversarial attack for unlabeled data, which makes the model confuse the instance-level identities of the perturbed data samples. Further, we present a self-supervised contrastive learning framework to adversarially train a robust neural network without labeled data, which aims to maximize the similarity between a random augmentation of a data sample and its instance-wise adversarial perturbation.

We refer to this novel adversarial self-supervised learning method as Robust Contrastive Learning (RoCL).

Our intuition is that we can fool the model by generating instance-wise adversarial examples. Specifically, we generate perturbations on augmentations of the samples to maximize their contrastive loss, such that the instance-level classifier becomes confused about the identities of the perturbed samples. Then, we maximize the similarity between clean samples and their adversarial counterparts using contrastive learning, to obtain representations that suppress distortions caused by adversarial perturbations. This will result in learning representations that are robust against adversarial attacks.

Instance-wise attack

•Generates a perturbation to fool the model by confusing its instance-level identity

•Maximizes the self-supervised contrastive loss

Robust Contrastive Learning

•RoCL: framework to learn robust representation via self-supervised contrastive learning

By using the instance-wise adversarial examples as additional elements in the positive set, we can train the model with self-supervised contrastive learning objective.

Linear evaluation

During adversarial training, we maximize the similarity between two differently transformed examples {t(x), t'(x)} and their adversarial perturbations t(x)^adv. After the model is fully trained to obtain robustness, then we evaluate the model on the target classification task by using linear model instead of projector.

We could either train the linear classifier only on clean examples (LE), or adversarially train it with class-adversarial examples (r-LE).

Robust contrastive learning training

Linear evaluation (robust linear evaluation r-LE)

Robustness of RoCL

To verify the efficacy of the proposed RoCL, we suggest a robust-linear evaluation for self-supervised adversarial learning and validate our method on benchmark datasets (CIFAR-10 and CIFAR-100) against supervised adversarial learning approaches with ResNet 18 and ResNet 50.

We can obtain robust representations without any labels during training
Comparable to supervised adversarial learning against whitebox attacks
Significantly better clean accuracy and robustness on some unseen type of attacks
Comparable or better to supervised adversarial learning against blackbox attacks
Significantly better robustness on transfer learning

Experimental results with white box attacks on ResNet18 trained on the CIFAR-10 and CIFAR-100 dataset.