This project is focused on exploring the robustness of adversarial patches. It looks at how well a machine-learning model can handle small changes in specific areas, like stickers on an image (adversarial attack).
In my experiments, I certified two major architectures -
Convolutional Neural Network (CNN) - ResNet18
Vision Transformer (ViT)
on CIFAR-10 (images with 10 different classes) and ImageNet1K (images with 1000 different classes) datasets, yielding 63% robust accuracy on the ViT model and 61% on the ResNet18 model.
Base Paper : Certified Patch Robustness via Smoothed Vision Transformers
The left image has a patch (attacked image), and the right image is the original patch-free image.
While ensuring model robustness with formal proofs (math-heavy) can be fun, the more captivating aspect lies in verifying models throughout their entire lifecycle, from training to deployment.
This study shows that integrating ViTs into the smoothing framework improves certified robustness against adversarial patches.
This enhancement is achieved while maintaining standard accuracies similar to regular (non-robust) models with much faster inference times.
These improvements make models certified robust to adversarial patches making them a practical choice over standard models.
The study also demonstrates the technology's real-world application, particularly on chair datasets, and evaluates performance in physical settings by printing posters with attacked and original images.
To verify patch robustness on a real-world dataset, both ViT and CNN models are tested against a small dataset of 1000 indoor images.
Data was collected through an Intel RealSense camera and I preprocessed frames to obtain chair object class via a Fast RCNN object detector.
The model was fine-tuned on a 60% data split of chairs and achieved 80% certified robustness on 16 patch-sized images.
While this experiment may seem biased by focusing on a single class, my main objective was to establish that careful training can ensure robustness on real-world data.
Some of the examples for patch attacks of 16 and 32 patch sizes.
In a physical-world test, I scanned printed images, both raw and attacked using my laptop camera, resulting in a robust accuracy of 20%.
It became evident that printing out an adversarial attack is not straightforward, as factors such as color differences from the printer and lighting effects were not considered.
These discrepancies in numbers are reasonable, and the primary goal of this verification was to observe how an attack would perform in real-world settings.
In this example, an image of a duck is divided into 8-column ablations, each classified individually. As 6 out of 8 column ablations are identified as a duck, the smoothed model predicts the overall image as a duck.
Since a small patch (size 32) can only impact a maximum of 2 columns, it can't change the result. Therefore, this prediction is certifiably robust against 32 patch size.
Conversely, a 64-sized patch can influence up to 3 column ablations, potentially causing the most frequent class to change to a boat, thus, the model is not certifiably robust against 64 patch size.
In general, the main objective of this study is to understand why a model makes a particular prediction given the inherent complexity of interpreting machine learning models.
In Verifiable Machine Learning, people are always coming up with new adversarial attacks and defenses where we intentionally manipulate input data to mislead a model and develop robust defenses against such attacks.
Incorporating smoothing to the ViT model position models to be certifiably robust to adversarial patches. This makes them a practical alternative to standard non-robust models. This application is proven to work in real-world scenarios and physical settings.