Holistic Approach to Measure Sample-level Adversarial Vulnerability and its Utility in Building Trustworthy Systems
Gaurav Kumar Nayak*, Ruchit Rawal*, Rohit Lal*, Himanshu Patil, Anirban Chakraborty
Abstract
Adversarial attack perturbs an image with an imperceptible noise, leading to incorrect model prediction. Recently, a few works showed inherent bias associated with such attack (robustness bias), where certain subgroups in a dataset (e.g. based on class, gender, etc.) are less robust than others. This bias not only persists even after adversarial training, but often results in severe performance discrepancies across these subgroups. Existing works characterize the subgroup’s robustness bias by only checking individual sample’s proximity to the decision boundary. In this work, we argue that this measure alone is not sufficient and validate our argument via extensive experimental analysis. It has been observed that adversarial attacks often corrupt the high-frequency components of the input image. We, therefore, propose a holistic approach for quantifying adversarial vulnerability of a sample by combining these different perspectives, i.e., degree of model’s reliance on high-frequency features with the (conventional) sample-distance to the decision boundary. We demonstrate that by reliably estimating adversarial vulnerability at the sample level using the proposed holistic metric, it is possible to develop a trustworthy system where humans can be alerted about the incoming samples that are highly likely to be misclassified at test time. This is achieved with better precision when our holistic metric is used over individual measures. To further corroborate the utility of the proposed holistic approach, we perform knowledge distillation in a limited-sample setting. We observe that the student network trained with the subset of samples selected using our combined metric performs better than both the competing baselines, viz., where samples are selected randomly or based on their distances to the decision boundary.
Comparison of Existing Works and Ours
Distance to Decision Boundary (DDB) -Unreliable estimate of adversarial vulnerability
Another perspective: Model Reliance on High Frequency Features
Building Trustworthy Systems
Flagging Precision
Flagging Recall
T-Score vs Steps to Attack
Time Efficient Training of Lightweight models using Knowledge Distillation
Flagging accuracy comparison for (Normalized) DDB score v/s (Our Proposed) Trust-Score (flagging) metrics for the model trained via our Trust Score based sample selection strategy
Citation:
If you find this work useful, please consider citing our work:
@inproceedings{
nayak2022_holistic,
title={Holistic Approach to Measure Sample-level Adversarial Vulnerability and its
Utility in Building Trustworthy Systems},
author={Nayak, G. K., Rawal, R., Lal, R., Patil, H., and Chakraborty, A.},
booktitle={Conference on Computer Vision and Pattern Recognition (CVPR) Workshop on Human-centered Intelligent Services Safety and Trustworthy},
year={2022}
}
License:
This project is licenced under an [MIT License].
Contact:
In case you have any queries, please get in touch via email: gauravnayak@iisc.ac.in, ruchitrawal22@gmail.com and rohitlal@iisc.ac.in