A Study on Robustness Enhancement and Multi-Adversarial Attacks in Vision Transformer-based Image Classification Models

Author: Sunjun Hwang†, Hongjoon Jun*, Sunje Keum**

KIISE Summer Conference 2025 – Undergraduate Paper Competition
한국정보기술학회 하계종합학술대회, 2025년 6월, 메종글래드 제주
© 2025 KIISE

Research Background

In recent years, deep learning-based image classification models have achieved remarkable performance and have been widely applied in various fields.
In particular, Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) have demonstrated outstanding accuracy and are used in areas ranging from autonomous driving to medical imaging.

However, these models remain vulnerable to adversarial attacks.
An adversarial attack introduces imperceptible perturbations to input images, which can mislead the model into producing incorrect predictions while remaining almost invisible to the human eye.
This vulnerability poses a serious security and reliability concern for real-world AI applications.

The representative adversarial attack methods are:

FGSM (Fast Gradient Sign Method)
- A fast one-step attack that perturbs the input in the direction of the gradient’s sign.
- Simple but relatively weak compared to iterative methods.
PGD (Projected Gradient Descent)
- A multi-step iterative attack based on FGSM that performs stronger perturbations.
- Each step is projected back into an ε-ball around the original image to stay within the allowed perturbation limit.
CW (Carlini & Wagner Attack)
- An optimization-based attack that minimizes perturbation magnitude while forcing misclassification.
- Known for being difficult to defend against without specific countermeasures.

To address these vulnerabilities, Adversarial Training has been introduced.
By incorporating adversarial examples during training, the model can learn to resist various types of attacks, improving its robustness without significantly sacrificing performance on clean data.

This study focuses on evaluating the robustness of a Vision Transformer (ViT-B32) against multiple adversarial attack methods and demonstrates how adversarial training can enhance the model’s resilience in practical scenarios.

Research Objectives

The primary objectives of this study are:

Evaluate the vulnerability of Vision Transformer (ViT-B32)
- Assess how susceptible the ViT-B32 model is to different adversarial attacks, including FGSM, PGD, and CW.
Analyze the impact of multi-adversarial attacks
- Determine how individual attacks and their combinations (e.g., FGSM+PGD) affect classification accuracy.
Enhance robustness through Adversarial Training
- Train the model with adversarial examples and measure improvements in resilience without significant loss of clean-data performance.

Key Principles / Methodology

This study investigates three major adversarial attack methods and their influence on the robustness of ViT models.

1. FGSM (Fast Gradient Sign Method)

FGSM perturbs the input by following the sign of the gradient of the loss with respect to the input.
This attack is fast and simple but generally weaker than iterative methods.

Where:

x is the original input image
x_adv is the adversarial example
ε controls the perturbation strength
J(θ,x,y) is the loss function with parameters θ

2. PGD (Projected Gradient Descent)

PGD can be seen as an iterative version of FGSM, performing multiple small perturbations and projecting the result back into the ε-ball around the original input to remain within the allowed perturbation limit.

Where:

α is the step size per iteration
Clip ensures the perturbation stays within the allowed ε-ball

3. CW (Carlini & Wagner) Attack

CW is an optimization-based attack designed to find the smallest possible perturbation that causes misclassification.

Where:

δ is the perturbation added to the original input
c balances perturbation magnitude and attack success rate
f(⋅) is a function ensuring misclassification

4. Adversarial Training

To improve model robustness, adversarial examples generated from FGSM, PGD, and CW attacks are included in the training process.
By learning from these challenging samples, the model gains resilience against attacks while maintaining reasonable accuracy on clean images.

ViT-B32 Model Overview

The ViT-B32 (Vision Transformer, Base with 32×32 patches) follows a simple yet powerful pipeline:

Input Image (224×224×3)
- The original image is divided into 16×16 patches, resulting in 14×14 = 196 patches.
Patch Embedding & Linear Projection
- Each patch is flattened and projected into a 768-dimensional embedding.
- A [CLS] token is prepended for classification purposes.
- Positional embeddings are added to retain spatial information lost by flattening.

3. Transformer Encoder (×12)

Each encoder layer consists of:
- Multi-Head Self-Attention (MSA)
- Feed-Forward MLP
- Layer Normalization & Residual Connections
These layers allow the model to capture long-range dependencies across the image.

4. Classification Head (MLP)

The final [CLS] token is fed into a small MLP head to produce class probabilities.

Why This Matters for Adversarial Robustness

Vision Transformers (ViTs) process images in a global, attention-based manner, which may make them:
1. Less sensitive to local pixel noise than CNNs
2. More vulnerable to structured attacks that target attention mechanisms
Adversarial Training is applied to improve the robustness of ViT models against attacks like:
1. FGSM (Fast Gradient Sign Method) – a fast, single-step perturbation
2. PGD (Projected Gradient Descent) – iterative and stronger perturbation
3. CW (Carlini & Wagner) – optimization-based and highly effective attack

By including adversarial samples during training, the ViT-B32 can learn to resist these attacks while maintaining accuracy on clean data.

Experiment Setup

This study evaluated the robustness of the ViT-B32 model against multiple adversarial attacks using the CIFAR-10 dataset.

1. Model and Dataset

Model: Vision Transformer (ViT-B32)
Dataset: CIFAR-10
- 60,000 images, 10 classes
- 50,000 for training and 10,000 for testing

2. Training Configurations
Two main training regimes were considered:

Clean Training – trained only on the original CIFAR-10 dataset
Adversarial Training – trained on a mixture of clean images and adversarial examples

3. Adversarial Attack Methods

FGSM (Fast Gradient Sign Method) – quick, single-step perturbation
PGD (Projected Gradient Descent) – multi-step iterative attack
CW (Carlini & Wagner) – optimization-based, highly effective attack

4. Attack Combinations for Training
The model was trained under eight different configurations:

Clean Only
FGSM
PGD
CW
FGSM + PGD
FGSM + CW
PGD + CW
FGSM + PGD + CW

5. Evaluation Metrics

Accuracy on:
1. Clean images
2. FGSM-attacked images
3. PGD-attacked images
4. CW-attacked images

Results and Analysis

1. Clean-Only Training

Achieved 92.9% accuracy on clean images.
Dropped sharply to 42.7% on FGSM and 23.8% on PGD.
Observation: The model is highly vulnerable without adversarial training.

2. FGSM + PGD Adversarial Training

Maintained clean accuracy at ~93% (minimal degradation).
FGSM and PGD attack accuracy improved to above 80%.
Observation: This setup provides the best overall robustness.

3. CW Attack Performance

Models not trained with CW samples showed unstable defense.
Including CW samples in training improved robustness, but sometimes decreased performance against other attacks.

Key Insights:

Multi-adversarial training improves ViT robustness without significant loss on clean data.
FGSM+PGD provides a balanced and strong defense.
CW attacks require dedicated training for effective defense.

Conclusion and Significance

This study empirically demonstrates the effectiveness of adversarial training for enhancing the robustness of Vision Transformers (ViTs) against multiple types of attacks.

Key conclusions are:

ViT models are highly vulnerable to adversarial attacks without robust training, even if they maintain high clean accuracy.
Multi-adversarial training, particularly FGSM + PGD, achieves the best balance between clean accuracy and robustness against gradient-based attacks.
CW attacks are especially challenging, requiring dedicated training or specialized defense strategies to maintain performance.
The results provide practical insights for real-world AI systems, where security and reliability are crucial, such as in:
- Autonomous driving
- Medical image diagnosis
- Industrial inspection and surveillance

Significance for the Field:

This work highlights that adversarial training remains one of the most practical and effective defense strategies.
The findings can guide the design of secure and reliable ViT-based image classification systems.
By showing the performance gap between different attacks and defenses, the study also motivates future research into hybrid or ensemble defense techniques.

My Insights

From my perspective as the researcher:

Real-World Implication
- The study confirms that clean accuracy alone is not sufficient for deploying AI in safety-critical environments.
- Robustness evaluation should always be part of the development process for ViT-based models.
FGSM+PGD as a Practical Baseline
- This combination provides a strong and relatively training-efficient defense,
  making it a good starting point for applications where computational resources are limited.
CW Attack Challenges
- CW remains the most difficult attack to defend against,
  suggesting that hybrid adversarial training or advanced techniques like defensive distillation may be necessary for production-level robustness.
Future Research Directions
- Explore multi-stage training, where simple attacks (FGSM/PGD) are used first,
  followed by highly optimized attacks like CW for fine-tuning robustness.
- Investigate attention-aware defense strategies, leveraging the unique structure of ViTs.

First Author

Sun Jun Hwang

Sunjun Hwang is an undergraduate researcher at the RAISE Lab, Yonsei University. His academic and research interests are centered on cutting-edge areas within computer science and quantum technology. He is particularly focused on quantum computing, exploring the principles and applications of quantum systems to solve complex computational problems. Additionally, Sunjun is deeply engaged in the study of quantum algorithms, investigating innovative approaches to designing algorithms that leverage quantum mechanics for enhanced computational efficiency. His interest in quantum artificial intelligence reflects his pursuit of integrating quantum computing techniques with AI methodologies to develop advanced intelligent systems. Furthermore, he is involved in artificial intelligence security, focusing on safeguarding AI systems against vulnerabilities. A key aspect of his work includes studying adversarial attacks, where he examines techniques used to manipulate or compromise AI models, aiming to develop robust defenses to enhance the security of intelligent systems.

Co-Author

Hong Joon Jun

Hongjoon Jun is a member of the Software Department at Yonsei University. His academic and professional interests are focused on several key areas within the field of computer science. Specifically, he is engaged in data mining, exploring techniques to extract meaningful patterns and insights from large datasets. Additionally, he has a strong interest in artificial intelligence, delving into the development and application of intelligent systems. His expertise extends to data analysis, where he applies analytical methods to interpret complex data, and regression analysis, which he utilizes to model relationships between variables for predictive purposes.

Sun Je Keum

Sunje Keum is also affiliated with the Software Department at Yonsei University. Her areas of interest lie in the dynamic and innovative domains of software development and visual computing. She is particularly focused on mobile programming, working on the design and implementation of applications for mobile platforms. Additionally, she is deeply engaged in image processing, exploring techniques to manipulate and enhance digital images. Her interest in photo analysis further complements her work, as she investigates methods to extract meaningful information from photographic data for various applications.

Page updated

Google Sites

Report abuse