This presentation examines the vulnerability of deep learning models to adversarial attacks, specifically using the FGSM (Fast Gradient Sign Method). The study compares the robustness of three representative network architectures—CNN, ResNet-18, and AlexNet—against adversarial perturbations on the CIFAR-10 dataset. It evaluates model performance under varying attack intensities (𝜖 values), analyzes both full and partial attacks, and performs cross-model evaluations to determine the generalizability of adversarial examples. Ultimately, the research aims to provide foundational insights for designing more robust AI systems in security-critical applications.
Deep learning models have achieved remarkable performance in tasks like image classification and natural language processing. However, these models are critically vulnerable to adversarial attacks: minor, often imperceptible perturbations to the input can lead to significant misclassifications.
Adversarial Attacks: Techniques that subtly alter input data (e.g., by adding noise) to mislead the model.
Research Questions:
How do different neural network architectures respond to FGSM-based adversarial attacks?
Can partial attacks reveal localized vulnerabilities within these models?
Do adversarial examples generated from one model transfer and deceive other models?
✅ Provides a comparative analysis of CNN, ResNet-18, and AlexNet under FGSM attacks.
✅ Examines the effects of full attacks versus partial attacks to identify region-specific vulnerabilities.
✅ Conducts cross-model evaluations to assess whether adversarial examples are transferable across different architectures.
✅ Offers experimental evidence to guide the development of more resilient neural network designs for security-sensitive applications.
FGSM (Fast Gradient Sign Method): An attack method that generates adversarial samples by using the gradient of the loss function with respect to the input.
Formula: 𝑥′ = 𝑥 + 𝜖 · sign(∇ₓ J(𝜃, 𝑥, 𝑦))
A larger 𝜖 increases the attack intensity, allowing the model to be misled with minimal perturbations.
Dataset: CIFAR-10, consisting of 60,000 32×32 color images categorized into 10 classes (50,000 for training, 10,000 for testing).
Neural Network Models:
CNN: A simple architecture using convolutional and pooling layers to capture local features.
ResNet-18: Utilizes residual connections to facilitate training in deeper networks and mitigate the vanishing gradient problem.
AlexNet: A relatively shallow network with large filters for feature extraction.
Experimental Environment & Hyperparameters:
Implemented using the PyTorch framework on an Nvidia RTX 4060 GPU.
Learning Rate: 0.01, Batch Size: 64, Optimizer: Adam, Epochs: 50.
FGSM attack intensity (𝜖 values) ranging from 0 to 1.0.
Attack Types:
Full Attack: FGSM applied to all pixels of the image to generate adversarial examples.
Partial Attack: FGSM applied only to specific regions (e.g., top, center, border) to examine localized vulnerabilities.
Cross-model Evaluation: Evaluates whether adversarial examples generated from one model remain effective when used against other models.
Full Attack Findings:
At 𝜖 = 0.01, most samples are still correctly classified.
At 𝜖 = 0.1, there is a sharp decline in classification accuracy across all models.
For 𝜖 ≥ 0.3, nearly all samples are misclassified.
Partial Attack Findings:
CNN shows significant performance degradation even when attacks target only specific regions.
ResNet-18 demonstrates relative robustness due to its residual architecture.
AlexNet is the most vulnerable under full attacks, although CNN is particularly sensitive to partial attacks.
Cross-model Evaluation:
Adversarial examples generated by one model often lead to high misclassification rates in other models, with AlexNet-derived examples being especially aggressive.
Architectural Vulnerability Analysis: The study experimentally confirms that different network architectures exhibit varying degrees of vulnerability to FGSM attacks.
Generalization of Adversarial Examples: It demonstrates that adversarial samples are not model-specific; examples crafted from one architecture can deceive others, highlighting a widespread threat.
Guidance for Robust Model Design: The experimental findings provide valuable data for designing more resilient AI systems in applications such as autonomous driving, medical diagnostics, and financial forecasting.
Study Limitations:
The evaluation is limited to FGSM attacks, without testing more potent techniques such as PGD or CW.
Experiments are confined to the CIFAR-10 dataset, necessitating further validation in real-world scenarios.
Future Research Directions:
✅ Expand the analysis to include additional adversarial attacks (e.g., PGD, CW, DeepFool).
✅ Explore various defense mechanisms, such as adversarial training, input preprocessing, and gradient masking.
✅ Assess the implications of adversarial attacks in practical settings like autonomous vehicles and medical imaging.
8️⃣ Conclusion
📌 Key Takeaways:
FGSM-based adversarial attacks reveal distinct vulnerabilities across different neural network architectures—CNN and AlexNet are notably susceptible, while ResNet-18 remains relatively robust.
Both full and partial attack strategies demonstrate that adversarial examples can compromise model performance, and these effects are transferable across models.
📌 Final Thoughts:
✅ This study provides a systematic analysis of adversarial vulnerabilities in deep learning models and offers experimental evidence to support the development of more robust network designs.
✅ Future research should integrate multiple attack and defense strategies to enhance the security of AI systems in real-world applications.
Sunjun Hwang is an undergraduate researcher at the RAISE Lab, Yonsei University. His academic and research interests are centered on cutting-edge areas within computer science and quantum technology. He is particularly focused on quantum computing, exploring the principles and applications of quantum systems to solve complex computational problems. Additionally, Sunjun is deeply engaged in the study of quantum algorithms, investigating innovative approaches to designing algorithms that leverage quantum mechanics for enhanced computational efficiency. His interest in quantum artificial intelligence reflects his pursuit of integrating quantum computing techniques with AI methodologies to develop advanced intelligent systems. Furthermore, he is involved in artificial intelligence security, focusing on safeguarding AI systems against vulnerabilities. A key aspect of his work includes studying adversarial attacks, where he examines techniques used to manipulate or compromise AI models, aiming to develop robust defenses to enhance the security of intelligent systems.
Yohan Ko is an Associate Professor in the Software Department at Yonsei University, where he has been serving since March 2021. He holds a Ph.D. (2018) and B.S. (2012) in Computer Science from Yonsei University. His research interests focus on embedded systems, hardware/software co-design, and system reliability. Dr. Ko has served as a reviewer for conferences like the International Conference on Hardware/Software Codesign and System Synthesis and journals such as IEEE Transactions on Systems, Man, and Cybernetics. He advises the startup club "Abroad.com" and has delivered invited talks on computing integrity and employment strategies. His accolades include the Bronze Best Paper Award (IEEE, 2016) and the NAVER Ph.D. Fellowship Award (2016).