ATTACK: Adversarial Training
ATTACK: Adversarial Training
How can image processing models become less vulnerable to adversarial attacks and how can we improve the accuracy of these models?
Image Classification:
Image classification models are like intelligent machines that can look at pictures and tell us what's in them. They can recognize objects, animals, or even people in images. But sometimes, these models can be tricked or fooled by adding some noise or changes to the pictures.
In short, it is a model that will take an input in the form of an image and go through its algorithm (in this case a neural network) in order to give an output on whatever class it thinks it will fit into based on the classes the model is programmed to use.
What is an adversarial attack?
An adversarial attack is a function or process intentionally designed to lower model accuracy. Attacks alter the learning process of the model. By attacking certain parts of the model, weaknesses can be identified. These adversarial attacks work by altering the image pixel values by a tiny amount until the model predicts the class incorrectly.
Normally, the model correctly identifies the image as a Chihuahua.
However, just by adding a little bit of background noise to the picture..
The output completely changes, yet the image is practically the same!
Types of Attacks:
Targeted attacks: Targeted attacks take a "goal" class (which we try to make the model incorrectly predict) and continuously alter the image by tiny amount until the target class is guessed. It does this by calculating the "distance" from the target class using a loss function and determining the "direction" the steps need to be taken (how to change the pixel values) using a gradient descent function, then iteratively taking "steps" towards the target until the image has been changed enough to reach the target class. Targeted attacks are best to use when you want to manipulate to model to a specific output that benefits you.
Untargeted attacks: Untargeted attacks adjust the learning process of the model. Normally, a model attempts to "move closer" to the original target through thousands of iterations, getting closer with each "step." An untargeted attack works against this learning process, moving "away" from the target. This attack is best at identifying general weaknesses in the model.
What do adversarial attacks look like?
Some of the parameters that are used: EPS, Step size, and Steps to effectively create an adversarial attack.
"Epsilon" or "eps": determines how much the model can be changed before its prediction flips. If the changes made to the picture are within this epsilon range, the model's prediction shouldn't change much. specifies how close points should be to each other to be considered a part of a cluster.
Step Size: refers to the magnitude of each adjustment made to the adversarial change during the process of creating an adversarial attack. It determines how large or small the changes are when modifying the original image.
Num of Step: how many of those steps?
Adversarial training to fight adversarial attacks.
Current best way to stop them is to train on adversarial images
Inefficient method of fixing
Needs large amounts of training data for virtually no return
AI not used for sensitive use cases so it doesn’t really matter
Lowers accuracy for normal images
The adversarial input appears to be more accurate, than normal due to its resilience to the "attacks" because it is pre-trained to recognized faulty images.
What does this tell us?
When trying to attack a more robust model, the images have to be altered greatly before the model predicts the target class. This can change the image in huge ways, interestingly showing us what the model looks at to classify an image.
This is the beginning image of the duck, the model will change the image to look more like a bee.
These changes tell us that the model "looks for" flowers when identifying bees, since most images of bees also contain flowers.
This is a real world use of adversarial attacks, the AI model is trained to highlight people, but with the adversarial images on the shirts, the AI model failed to recognize the people standing there.
Adversarial stop signs, small patches or graffiti can be added to a stop sign that looks harmless to humans, but causes AI to stop recognizing stop signs, which could easily be used to cause a crash.
INTERESTING ATTACKS WE'VE ENCOUNTERED...