The

Adversaries

Team: Shriya Krishnan, Collin Brown, Darek Yu, Kevin Luo, Nathan Cleanthous, Sydni Burse

Faculty: Professor Soheil Feizi and Mazda Moayeri

I4C Teaching Assistant: Lia Arakal

Project Question

Can we teach computers to see? And do they see the world the same we do?

Can we use an adversarially trained image classifier to protect against adversarial attacks and generate new images?

Project Overview

Image Classification

We trained a model to classify images from the CIFAR10 dataset. We also learned how to access the pre-trained model ResNet, which is trained on ImageNet, a collection of about one million labeled images over 1,000 classes (categories). Each class label is the name of the object in the image, such as a cheeseburger or guacamole. The model is trained on this dataset so it is able to recognize objects in different images.

The Problem

The issue with these image classification models is that when a slight bit of carefully crafted "noise" is added to the photo, the algorithm widely misclassifies the image. In order to improve these models so that they can properly recognize the images, adversarial training is used. This involves slightly altering images by adding "noise", which are changes to pixel values by small, seemingly unpatterned amounts. Thus, these changes are imperceptible to humans, but they can incur drastic reduction in performance, often leading to 0% accuracy under attack for otherwise highly accurate models. Also, they can cause the model to misclassify the images as something else entirely with very extremely high levels of confidence, as seen below.

Source: Explaining and Harnessing Adversarial Examples, Goodfellow et al, ICLR 2015.

Adversarial Attacks

These slightly altered images that result in inaccurate and highly confident predictions are known as adversarial attacks. In order to defend against these adversarial images, we can attack our algorithm by slightly changing our training images and trying to trick our algorithm. By creating these images and training our model on them, the model can become more robust to the attack and less likely to make mistakes. Adversarial machine learning is an active area of research that can be used to improve deep learning for image classification systems.

Creating New Images

We can leverage the perceptually aligned gradients of an adversarially trained model to generate new images.

Saxophone

Saxophone image from the internet

Source: Yamaha Custom ZII Unlacquered Alto Saxophone.

Broccoli

Broccoli image from ImageNet

Broccophone

Model generated broccoli saxophone

Action Steps

Our first step is to create our image classifier. This will label our images by training on the labeled images from IMAGENET.
Our second step is to create our adversarial attack. The code for this section will create noise to add to our image that will create a slight change to the image that a human cannot perceive, but will cause the computer to misclassify the image horribly.

Before Noise

Our image classifier was able to correctly classify the image as a saxophone.

After Noise

If you look closely, you can see in the background that some of the pixels have been changed. These imperceptible changes make a huge difference in the algorithm, causing the algorithm to classify this image as an oboe.

Our third step is to use these images to train the model. This increases its robustness and defends it from adversarial attacks.
Our fourth step is to attack the adversarially trained model, which will allow us to generate new images.

Deliverables

Trained an Image Classifier with Gradient Descent
Crafted adversarial attacks on our classifier using Projected Gradient Descent
Generated new images by adversarially attacking an adversarially trained model

Image Classifier Results

Input

An image of a ballpoint pen

Source: Plastic Blue Ball Pen.

Output

The model classified the image as a ballpoint pen

Harnessing Adverserially training to generate images

We harnessed adversarial training to change an image from the internet to look like objects from the IMAGENET data set.

Input - Scorpion

Source: Scorpions.

Output - Scorpion Spider

(Spirpion/scoider)

Input- Apple

Source: Apple.

Output - Apple Bell Pepper

(Pepple/appler)

Conclusion

Computers are able to analyze images and classify objects with accuracies similar, if not better, to humans. Just like human brains, neural networks recognize patterns and improve with correct training. However, unlike humans, deep learning models analyze images by pixels and RGB values. We can attack these models by creating "noise" that is meant to trick the models into thinking that an object is different. This is called an adversarial attack. We can then train the model on adversarially attacked images so that they are more robust and able to classify the objects correctly. We can then, once again, adversarially attack these models to generate new images. Adversarial machine learning is an open field of research and researchers are still exploring ways to better defend against adversarial attacks. This research will be extremely useful so that image classification models can be more accurate. For example, if a self-driving car were to come across a stop sign that had been vandalized, an adversarially trained model would help it be able to recognize it as a stop sign. We are excited to see the future of this AI technology!

AI4All - Adversarial Attacks

Page updated

Report abuse