ML for Scientific Data - ORNL

Research Highlight

Exploiting the Parabolic Loss Landscape to Accelerate Black-Box Adversarial Attack

Achievement

Black-box methods require a massive amount of queries to find a successful adversarial perturbation. Since each query to the target model costs time and money, query efficiency is a requisite for any practical black-box attack method. Recent years have seen the development of several black-box approaches with significant improved query efficiency. However, current black-box attacks access the target models only at perturbed samples and completely rely on the queries there to update the perturbation at each iteration. To reduce the number of queries, it would be beneficial to be able to make use of these queries to extract more from the models, inferring the loss values and identifying candidate perturbations, where no model query was made. This is a challenging goal: since the landscapes of adversarial losses are often complicated and not well-understood, the accuracy of approximations of the loss values from available model queries is not guaranteed.

In this paper, we develop a new l2 black-box adversarial attack on frequency domain, which uses an interpolation scheme to approximate the loss value around the current state and guide the perturbation updates. We refer to our method as Black-box Attack Based on IntErpolation Scheme (BABIES). This algorithm is inspired by our observation that for many standard and robust image classifiers, the adversarial losses behave like parabolas with respect to perturbations of an image in the Fourier domain, thus can be captured with quadratic interpolation. We treat the adversarial attack problem as a constraint optimization on an l2 sphere, and sample along geodesic curves on the sphere. If the queries show improvements, we accept the perturbation. If the queries do not show improvement, we will infer a small perturbation from those samples without additional queries. Our method achieves significantly improved query efficiency because the perturbation updates are now informed not only directly from model queries (as in existing approaches), but also from an accurate quadratic approximation of the adversarial loss around the current state. The main contributions of this work can be summarized as follows:

Theoretical and empirical justifications on the fact that the adversarial loss behaves like a parabola in the Fourier domain, but not like a parabola in the pixel domain.
Development of BABIES, a random-search-based black-box adversarial attack method that exploits the parabolic loss landscape to improve the query efficiency.
Extensive evaluations of BABIES with targeted and untargeted attacks on MNIST, CIFAR-10 and ImageNet datasets with both standard and defended models.

Figure 1: Examples of attacking the Google Cloud Vision API to remove top-3 labels . The original images and the perturbed images by our methods are shown with their top labels and probabilities.

Publication

H. Tran, D. Lu, and G. Zhang, Exploiting the local parabolic landscapes of adversarial losses to accelerate black-box adversarial attack, Proceedings of 17th European Conference on Computer Vision (ECCV 2022), 2022.