Author: Sunjun Hwang†, Hongjoon Jun*, Sunje Keum**
KIISE Summer Conference 2025 – Undergraduate Paper Competition
한국정보기술학회 하계종합학술대회, 2025년 6월, 메종글래드 제주
© 2025 KIISE
KIISE Summer Conference 2025 – Undergraduate Paper Competition
한국정보기술학회 하계종합학술대회, 2025년 6월, 메종글래드 제주
© 2025 KIISE
In recent years, deep learning-based image classification models have achieved remarkable performance and have been widely applied in various fields.
In particular, Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) have demonstrated outstanding accuracy and are used in areas ranging from autonomous driving to medical imaging.
However, these models remain vulnerable to adversarial attacks.
An adversarial attack introduces imperceptible perturbations to input images, which can mislead the model into producing incorrect predictions while remaining almost invisible to the human eye.
This vulnerability poses a serious security and reliability concern for real-world AI applications.
The representative adversarial attack methods are:
FGSM (Fast Gradient Sign Method)
A fast one-step attack that perturbs the input in the direction of the gradient’s sign.
Simple but relatively weak compared to iterative methods.
PGD (Projected Gradient Descent)
A multi-step iterative attack based on FGSM that performs stronger perturbations.
Each step is projected back into an ε-ball around the original image to stay within the allowed perturbation limit.
CW (Carlini & Wagner Attack)
An optimization-based attack that minimizes perturbation magnitude while forcing misclassification.
Known for being difficult to defend against without specific countermeasures.
To address these vulnerabilities, Adversarial Training has been introduced.
By incorporating adversarial examples during training, the model can learn to resist various types of attacks, improving its robustness without significantly sacrificing performance on clean data.
This study focuses on evaluating the robustness of a Vision Transformer (ViT-B32) against multiple adversarial attack methods and demonstrates how adversarial training can enhance the model’s resilience in practical scenarios.
The primary objectives of this study are:
Evaluate the vulnerability of Vision Transformer (ViT-B32)
Assess how susceptible the ViT-B32 model is to different adversarial attacks, including FGSM, PGD, and CW.
Analyze the impact of multi-adversarial attacks
Determine how individual attacks and their combinations (e.g., FGSM+PGD) affect classification accuracy.
Enhance robustness through Adversarial Training
Train the model with adversarial examples and measure improvements in resilience without significant loss of clean-data performance.
This study investigates three major adversarial attack methods and their influence on the robustness of ViT models.
1. FGSM (Fast Gradient Sign Method)
FGSM perturbs the input by following the sign of the gradient of the loss with respect to the input.
This attack is fast and simple but generally weaker than iterative methods.
Where:
x is the original input image
x_adv is the adversarial example
ε controls the perturbation strength
J(θ,x,y) is the loss function with parameters θ
2. PGD (Projected Gradient Descent)
PGD can be seen as an iterative version of FGSM, performing multiple small perturbations and projecting the result back into the ε-ball around the original input to remain within the allowed perturbation limit.
Where:
α is the step size per iteration
Clip ensures the perturbation stays within the allowed ε-ball
3. CW (Carlini & Wagner) Attack
CW is an optimization-based attack designed to find the smallest possible perturbation that causes misclassification.
Where:
δ is the perturbation added to the original input
c balances perturbation magnitude and attack success rate
f(⋅) is a function ensuring misclassification
4. Adversarial Training
To improve model robustness, adversarial examples generated from FGSM, PGD, and CW attacks are included in the training process.
By learning from these challenging samples, the model gains resilience against attacks while maintaining reasonable accuracy on clean images.
The ViT-B32 (Vision Transformer, Base with 32×32 patches) follows a simple yet powerful pipeline:
Input Image (224×224×3)
The original image is divided into 16×16 patches, resulting in 14×14 = 196 patches.
Patch Embedding & Linear Projection
Each patch is flattened and projected into a 768-dimensional embedding.
A [CLS] token is prepended for classification purposes.
Positional embeddings are added to retain spatial information lost by flattening.
3. Transformer Encoder (×12)
Each encoder layer consists of:
Multi-Head Self-Attention (MSA)
Feed-Forward MLP
Layer Normalization & Residual Connections
These layers allow the model to capture long-range dependencies across the image.
4. Classification Head (MLP)
The final [CLS] token is fed into a small MLP head to produce class probabilities.
Vision Transformers (ViTs) process images in a global, attention-based manner, which may make them:
Less sensitive to local pixel noise than CNNs
More vulnerable to structured attacks that target attention mechanisms
Adversarial Training is applied to improve the robustness of ViT models against attacks like:
FGSM (Fast Gradient Sign Method) – a fast, single-step perturbation
PGD (Projected Gradient Descent) – iterative and stronger perturbation
CW (Carlini & Wagner) – optimization-based and highly effective attack
By including adversarial samples during training, the ViT-B32 can learn to resist these attacks while maintaining accuracy on clean data.
This study evaluated the robustness of the ViT-B32 model against multiple adversarial attacks using the CIFAR-10 dataset.
1. Model and Dataset
Model: Vision Transformer (ViT-B32)
Dataset: CIFAR-10
60,000 images, 10 classes
50,000 for training and 10,000 for testing
2. Training Configurations
Two main training regimes were considered:
Clean Training – trained only on the original CIFAR-10 dataset
Adversarial Training – trained on a mixture of clean images and adversarial examples
3. Adversarial Attack Methods
FGSM (Fast Gradient Sign Method) – quick, single-step perturbation
PGD (Projected Gradient Descent) – multi-step iterative attack
CW (Carlini & Wagner) – optimization-based, highly effective attack
4. Attack Combinations for Training
The model was trained under eight different configurations:
Clean Only
FGSM
PGD
CW
FGSM + PGD
FGSM + CW
PGD + CW
FGSM + PGD + CW
5. Evaluation Metrics
Accuracy on:
Clean images
FGSM-attacked images
PGD-attacked images
CW-attacked images
1. Clean-Only Training
Achieved 92.9% accuracy on clean images.
Dropped sharply to 42.7% on FGSM and 23.8% on PGD.
Observation: The model is highly vulnerable without adversarial training.
2. FGSM + PGD Adversarial Training
Maintained clean accuracy at ~93% (minimal degradation).
FGSM and PGD attack accuracy improved to above 80%.
Observation: This setup provides the best overall robustness.
3. CW Attack Performance
Models not trained with CW samples showed unstable defense.
Including CW samples in training improved robustness, but sometimes decreased performance against other attacks.
Key Insights:
Multi-adversarial training improves ViT robustness without significant loss on clean data.
FGSM+PGD provides a balanced and strong defense.
CW attacks require dedicated training for effective defense.
This study empirically demonstrates the effectiveness of adversarial training for enhancing the robustness of Vision Transformers (ViTs) against multiple types of attacks.
Key conclusions are:
ViT models are highly vulnerable to adversarial attacks without robust training, even if they maintain high clean accuracy.
Multi-adversarial training, particularly FGSM + PGD, achieves the best balance between clean accuracy and robustness against gradient-based attacks.
CW attacks are especially challenging, requiring dedicated training or specialized defense strategies to maintain performance.
The results provide practical insights for real-world AI systems, where security and reliability are crucial, such as in:
Autonomous driving
Medical image diagnosis
Industrial inspection and surveillance
Significance for the Field:
This work highlights that adversarial training remains one of the most practical and effective defense strategies.
The findings can guide the design of secure and reliable ViT-based image classification systems.
By showing the performance gap between different attacks and defenses, the study also motivates future research into hybrid or ensemble defense techniques.
From my perspective as the researcher:
Real-World Implication
The study confirms that clean accuracy alone is not sufficient for deploying AI in safety-critical environments.
Robustness evaluation should always be part of the development process for ViT-based models.
FGSM+PGD as a Practical Baseline
This combination provides a strong and relatively training-efficient defense,
making it a good starting point for applications where computational resources are limited.
CW Attack Challenges
CW remains the most difficult attack to defend against,
suggesting that hybrid adversarial training or advanced techniques like defensive distillation may be necessary for production-level robustness.
Future Research Directions
Explore multi-stage training, where simple attacks (FGSM/PGD) are used first,
followed by highly optimized attacks like CW for fine-tuning robustness.
Investigate attention-aware defense strategies, leveraging the unique structure of ViTs.