Experiments

Idea

To corroborate the existence of holes in the latent space, we must devise an approach to arrive at these latent holes. One naive approach could be to perform random sampling and examine the quality of each generated output. However, such an approach is very time-consuming and therefore, infeasible. Therefore, we devise an alternate technique to identify the regions in latent space which could correspond to these holes.

Our proposed method takes inspiration from adversarial attacks. Given a sampled latent vector z, we generate the corresponding output O using a generative model. We also define a loss function L which indicates the quality of the generated image. Then, we perform gradient ascent over z using L. We repeat this approach iteratively until a pre-defined threshold. In this way, we can arrive at z which correspond to poor generated outputs.

Implementation Details

Our adversarial attack makes use of Projected Gradient Descent (PGD) [1]. However, unlike PGD, we do not impose any constraints on the gradient step (except in experiment 1, where we clamp the change in z by 0.05). This is because we want to explore the entire latent space in search of adversarial holes and not just the neighbourhood of the sampled z. We run this modified PGD for 100 iterations. We use the StyleGAN [2] architecture as the baseline generative model. We follow the same hyper-parameters as StyleGAN for training.

Experiment 1: MSE Loss

First, we perform an adversarial attack over the latent code z by maximizing the mean squared error loss function between the current perturbed image and the original input image. The network architecture is illustrated below.

Network architecture for Experiment 1. We use StyleGAN's generator (G) to generate images (I). We use the constrained PGD attack to maximize the MSE loss and perform gradient ascent over the latent vector (z).

Using this setup, we visualize the generated images from StyleGAN.

StyleGAN outputs

Perturbed Images

Observing the outputs, this is an expected results. Since MSE is a paired loss function, any photorealistic face which doesn't look like the input face can maximize the loss function during the adversarial attack. Moreover, since the attack is strictly constrained, this setup is unable to explore the entire latent space to arrive at adversarial holes. Therefore, we need to remove the constraints for PGD. Moreover, we also need to impose a stricter loss function since, in some sense, there are multiple solutions to maximization of MSE loss and an adversarial holes is not the only solution.

Experiment 2: StyleGAN Discriminator

A discriminator is like a loss function which is not paired. Therefore, in our next experiment, we perform the PGD attack while maximizing the W-GAN loss over the discriminator so as to arrive at latent codes z which are classified as fake by the discriminator with high confidence.

Network architecture for Experiment 2. Instead of maximizing the MSE Loss, we maximize the StyleGAN Discriminator loss during PGD. We also remove the clamping on the size of the gradient step in PGD.

In order to generate an image from StyleGAN, a latent vector is first sampled from a standard normal distribution (called the z-space). From this latent vector, a (non-linear) latent mapper is employed to transform this distribution into a learned latent distribution (called the w-space). In this experiment, we first adversarially perturb the initial sampled vector from the z-space. Next, we perturb the resultant mapped latent vector from the W-space. The results are visualized below.

StyleGAN outputs

Perturbed Image (Z Space)

Perturbed Image (W Space)

The outputs start to look like random noise. This means that the adversarial attack indeed leads us to latent codes which, when sampled, lead to very poor reconstructions. While these results look promising, we need to verify whether the adversarially perturbed latent code in the z-space still belongs to a standard normal distribution. In order to verify this, we perform PCA and project the latent vectors into 2 dimensions and plot them.

PCA Projections from Z Space

PCA Projections from W Space

In the z-space, we observe that the sampled latent vectors are tightly packed around the origin. This is expected, since they are sampled from a standard normal distribution. The adversarial latent codes (in green) are much larger in magnitude and probably out of distribution. This is even more evident for the case of w-space, where all the sampled points collapse into a single dot since the magnitude of the adversarial latent vectors is significantly larger.

Therefore, this experiment remains inconclusive. For points outside the distribution, random generations are expected. For our next experiment, we look to enforce distribution similarity.

Experiment 3: StyleGAN Discriminator + KL Divergence

In this experiment, while maximizing the discriminator loss function, we simultaneously minimize the KL Divergence between the adversarial latent codes (in Z Space) and standard normal distribution. In this way, we aim to arrive at latent vectors z which, while lying in the original latent distribution, are not realistic.

Network architecture for Experiment 3. While maximizing the discriminator loss as before, we simultaneously minimize the KL divergence with a standard normal distribution during PGD.

In order to verify our intuition, we plot the trajectory of the projected latent vector across the iterations of the adversarial attack.

latent_code_vis_unbounded.mp4

Without KL Divergence

latent_code_vis_bounded.mp4

With KL Divergence

Evident from the scale of the plots, the magnitude of the latent vector projections has been reduced. At the very least, this ensures that the adversarial vector does not lie well outside the latent distribution. In order to verify this claim, we visualize the generated faces corresponding to these adversarial vectors.

StyleGAN outputs

Perturbed Images

The faces generated from the adversarial latent vectors contain some artefacts. At this stage, we raise the question: "What does it really mean to have an adversarially latent vector?" Since these faces do not look particularly realistic, these could be considered as adversarial. However, we wish to take this one step further. In our formulation, we consider adversarial latent vectors as those which result in noisy and random generations.

Experiment 4: Noise Discriminator + KL Divergence

In this experiment, we aim to train a noise discriminator which classifies any image (be it from the dataset or generated from StyleGAN) as real and random noise as fake. We use the StyleGAN discriminator and finetune the network for this objective.

For the well-trained discriminator, the decision boundary lies between faces which look realistic and those which look fake. At the extremity of the "fake" hyperplane (i.e. far away from the decision boundary but in the "fake half-space), images which do not remotely look like faces could exist. Therefore, the objective of re-training the StyleGAN discriminator is to shift the decision boundary towards this extremity in the "fake" hyperplane. We train this discriminator for 10 epochs.

In order to perform the adversarial attack, we now maximize the WGAN loss for the noise discriminator during the adversarial attack, rather than the StyleGAN discriminator. As in Experiment 3, we also simultaneously minimize the KL Divergence with a standard normal distribution to ensure the adversarial latent codes are in-domain.

Network architecture for Experiment 4. Instead of using StyleGAN discriminator, we use a noise discriminator and perform PGD as before.

Using this network, we visualize the generated faces after the adversarial attack below. We also plot the PCA projections corresponding to the adversarially attacked latent vectors.

StyleGAN outputs

Perturbed Images

PCA Projections

The faces generated by the generator corresponding to latent vectors from the adversarial holes do not look realistic at all. Moreover, most of these projections lie close to the original distribution. By large, it seems as though they lie in the original latent distribution. Therefore, we have successfully arrived at adversarial holes i.e., latent vectors (obtained adversarially) within the latent distribution corresponding to which generated images are very noisy and not realistic.

Experiment 5: Noise Discriminator + Latent Code Discriminator

Discriminative loss is a stronger penalty than KL Divergence to enforce the in-domain constraint for the adversarial latent vectors. Therefore, in this experiment, we aim to train a latent code discriminator which can distinguish between real and fake latent codes. For this experiment, we deal in the W-space.

Since the concept of an adversarial latent code is not well-defined, we wish to train an adversarial mapper i.e. a latent mapper which maps a given latent vector into adversarial holes of the W Space. This will serve as the fake latent vectors for the discriminator and the outputs from StyleGAN's latent mapper serve as the real latent vectors. We train this setup end-to-end while initializing the adversarial latent mapper with the same weights as StyleGAN's latent mapper.

Network architecture for Experiment 5. Instead of using KL Divergence, we add a discriminator (D_latent) to distinguish between in-distribution (w_real) and out-of-distribution (w_adv) latent vectors. In order to arrive at w_adv, we train an adversarial mapper to map a sampled latent vector (z) into the adversarial holes. Concretely, the noise discriminator (D_noise) should classify the generated image (I_adv) as random noise but D_latent should classify it as in-distribution [3].

Unfortunately, this setup proves extremely hard to train. We observe exploding loss values and gradients, and despite our best efforts, could not seem to get this training to converge.

Quantitative Evaluation

In order to verify that generated images are in distribution, we also compute Clean-FID [4] at the end of each experiment.

Quantitative Metrics. We compute the FID at the end of each experiment

In the Z-Space, the lowest FID is observed using MSE loss and constrained PGD, since the strong clamping on the gradient step prevents the latent codes to drift away from the original latent distribution. This is also proven by the fact that the FID is similar to the FID computed for the sampled images. In an unconstrained PGD setting, the FID is the highest without enforcing KL divergence between z and adv_z. Moreover, it is also significantly greater than the FID for the sampled images, which suggests that the adversarial latent codes may have strayed out of distribution. On enforcing KL Divergence, we observe a slight fall in the FID. Finally, on replacing the StyleGAN discriminator with a noise discriminator, we obtain similar results, which proves that adding KL Divergence is indeed necessary.

In the W-Space, we observe surprising results. In experiment 1 and 2, we observe lower FID than the sampled images, which may be due to some sampling artefacts. We use a small sample size (15) due to compute constraints which significantly impacts the FID metric. For the experiments that follow, since the exact distribution of W Space is unknown in it's closed form, we cannot enforce KL divergence.

References

  1. Madry, Aleksander, et al. "Towards deep learning models resistant to adversarial attacks." arXiv preprint arXiv:1706.06083 (2017).

  2. Karras, Tero, Samuli Laine, and Timo Aila. "A style-based generator architecture for generative adversarial networks." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019.

  3. Leng, Guangjie, Yekun Zhu, and Zhi-Qin John Xu. "Force-in-domain GAN inversion." arXiv preprint arXiv:2107.06050 (2021).

  4. Parmar, Gaurav, Richard Zhang, and Jun-Yan Zhu. "On Aliased Resizing and Surprising Subtleties in GAN Evaluation." Computer Vision and Pattern Recognition. 2022.