Related Work

Generative Adversarial Networks (GANs)

Style-based GANs build on progressive growing GANs [1, 2, 3, 4]. They implicitly learn hierarchical latent styles for image generation and manipulate the per-channel mean and variance to control the style of an image effectively. As shown in the figure below, StyleGAN takes style vectors (output by a mapping network) and stochastic variation (provided by the noise layers) as inputs for image synthesis. This offers control over the style of generated images at different levels of detail and makes possible the generation of photorealistic images. In this work, we use StyleGAN [1] as base model for our experiments.

(a) Architecture of a traditional style-based generator. (b) Architecture of StyleGAN: The latent codes samples in z space are mapped to a more disentangled w space which is fed to a synthesis network g at different hierarchies.

GAN Inversion

GANs encode rich semantic information in their latent space. This information presents itself as different attributes which can be thought of as individual basis vectors in the latent space. This makes it feasible for controlling and explaining the generation process of a GAN. GAN Inversion is a technique to map real images to a latent space. Finding a representative and disentangled latent space is crucial to enabling semantic image editing. The following equation represents GAN inversion formulation as an optimization problem:

Several methods [7, 8, 9] have been developed to solve the above equation with a formulation based on learning, optimization, or both. A learning-based inversion method aims to learn an encoder network to map an image into the latent space such that the reconstructed image based on the latent code looks as similar to the original one as possible. An optimization-based inversion approach directly solves the objective function through back-propagation to find a latent code that minimizes pixel-wise reconstruction loss. A hybrid approach first uses an encoder to generate initial latent code and then refines it with an optimization algorithm.

A key factor that affects the quality of GAN inversion is whether an inverted encoding lies within the region represented by training examples in the latent space. A few recent approaches have proposed to force the inverted code to lie within this latent region by employing constraints on the target image at pixel level, or directly in the latent space. A recent technique [6] enforces this constraint on the w latent space by training an auxiliary discriminator that learns to differentiate between real and fake w latent vectors (shown in the figure below). Our network design also takes inspiration from this work.

The network architecture in [6]. The FC block is from StyleGAN that maps latent codes z from a Gaussian distribution to the w latent space. Only green blocks are trainable.

Adversarial Attack

In order to explore robustness of latent space, interpolation is commonly used [1, 2, 3, 4]. However, to the best of our knowledge, latent space robustness when viewed from an adversarial perspective has received very limited focus.

Fast Gradient Sign Method (FGSM) [10] and Projected Gradient Descent (PGD) [11] are adversarial attacks which perform gradient ascent with respect to the classification loss to arrive at perturbed images which fools the classifier. In this work, we modify PGD to work with other loss functions more suited for generative models.

References

  1. Karras, Tero, Samuli Laine, and Timo Aila. "A style-based generator architecture for generative adversarial networks." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019.

  2. Karras, Tero, et al. "Analyzing and improving the image quality of stylegan." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020.

  3. Karras, Tero, et al. "Alias-free generative adversarial networks." Advances in Neural Information Processing Systems 34 (2021).

  4. Karras, Tero, et al. "Progressive growing of gans for improved quality, stability, and variation." arXiv preprint arXiv:1710.10196 (2017).

  5. Zhu, Jiapeng, et al. "In-domain gan inversion for real image editing." European conference on computer vision. Springer, Cham, 2020.

  6. Leng, Guangjie, Yekun Zhu, and Zhi-Qin John Xu. "Force-in-domain GAN inversion." arXiv preprint arXiv:2107.06050 (2021).

  7. Zhu, Jun-Yan, et al. "Generative visual manipulation on the natural image manifold." European conference on computer vision. Springer, Cham, 2016.

  8. Abdal, Rameen, Yipeng Qin, and Peter Wonka. "Image2stylegan: How to embed images into the stylegan latent space?." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019.

  9. Richardson, Elad, et al. "Encoding in style: a stylegan encoder for image-to-image translation." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021.

  10. Goodfellow, Ian J., Jonathon Shlens, and Christian Szegedy. "Explaining and harnessing adversarial examples." arXiv preprint arXiv:1412.6572 (2014).

  11. Madry, Aleksander, et al. "Towards deep learning models resistant to adversarial attacks." arXiv preprint arXiv:1706.06083 (2017).