SemanticAdv: Generating Adversarial Examples via Attribute-conditional Image Editing

Abstract

Deep neural networks (DNNs) have achieved great success in various applications due to their strong expressive power. However, recent studies have shown that DNNs are vulnerable to adversarial examples which are manipulated instances targeting to mislead DNNs to make incorrect predictions. Currently, most such adversarial examples try to guarantee “subtle perturbation" by limiting the Lp norm of the perturbation. In this paper, we aim to explore the impact of semantic manipulation on DNNs predictions by manipulating the semantic attributes of images and generate “unrestricted adversarial examples". Such semantic based perturbation is more practical compared with the Lp bounded perturbation. In particular, we propose an algorithm SemanticAdv which leverages disentangled semantic factors to generate adversarial perturbation by altering controlled semantic attributes to fool the learner towards various “adversarial" targets. We conduct extensive experiments to show that the semantic based adversarial examples can not only fool different learning tasks such as face verification and landmark detection, but also achieve high targeted attack success rate against real-world black-box services such as Azure face verification service based on transferability. To further demonstrate the applicability of SemanticAdv beyond face recognition domain, we also generate semantic perturbations on street-view images. Such adversarial examples with controlled semantic manipulation can shed light on further understanding about vulnerabilities of DNNs as well as potential defensive approaches.

Video Demo: Attacking Real-world Face Verification Platform with Semantic Adversarial Examples


Video Demo: Generating Semantic Adversarial Examples using Attribute-Conditioned Generator


Visualizations: Attacking Online Face Verification Platform

Figure 1: Illustration of our SemanticAdv in the real world face verification platform (editing on pale skin). Note that the confidence denotes the likelihood that two faces belong to the same person.


Visualizations: Single-attribute Adversarial Attack on Face Images

Figure 2: Qualitative analysis on single-attribute adversarial attack.


Visualizations: Comparisons with Pixel-wise Adversarial Attack on Face Images

Figure 3: Qualitative comparisons between our proposed semanticAdv and pixel-wise adversarial examples generated by CW. Along with the adversarial examples, we also provide the corresponding perturbations (residual) on the right.

Figure 4: Qualitative analysis on single-attribute adversarial attack by each other. Along with the adversarial examples, we also provide the corresponding perturbations (residual) on the right.

Figure 5: Qualitative comparisons among ground truth, pixel-wise adversarial examples generated by CW and our proposed semanticAdv. Here we choose the results from G-FPR < 0.0001 to make perturbations visually obvious.


Visualizations: Layout-Conditioned Adversarial Attacks on Street-view Images

Figure 6: Qualitative results on layout-conditioned adversarial attack.