Background
Best Viewed at https://sites.google.com/andrew.cmu.edu/multigan-distillation (CMU access)
Code available here
Best Viewed at https://sites.google.com/andrew.cmu.edu/multigan-distillation (CMU access)
Code available here
A typical Knowledge Distillation [1] framework involves learning a student network (usually small) to mimic the behavior of a teacher network (usually large). Very few works have explored this framework for generative tasks [8, 10]. A common problem that could arise in GAN distillation (or compression) is that it is hard to identify and prune redundant weights in a GAN due to the high complexity of the image space. Moreover, the metrics used to evaluate the generated images do not capture the visual quality as perceived by humans. These effects coupled with the associated training difficulties are the prime reason for slow progress in this area.
Recent methods [8, 11, 15, 16] propose distillation at various granularities: from low-level pixel-wise statistics at the generator outputs to high-level content matching using the perceptual losses. Cheng et al. [10] proposed a black-box distillation framework that does not require access to the internals of the source-GAN. However, most methods assume access to a single source-GAN on a chosen (known) dataset. In practice however one would have access to several publicly available models without the associated training datasets. This calls for a data-efficient (or data-free) approach to perform GAN distillation.
Several data-free knowledge distillation and domain adaptation techniques have been investigated [3, 4, 5, 6, 17]. These methods work in the absence of the corresponding training datasets of the source (teacher) models. However, they are limited to the context of image classification and do not extend to the task of image generation. On another parallel front, Sankaranarayanan et al. [14] use a GAN framework to define auxiliary loss functions in the image space to enhance knowledge transfer across domains. This suggests the feasibility of using discriminative and generative tasks together during distillation; it still requires access to the training datasets of the source model.
Most prior works consider each scenario in isolation and are therefore limited in scope due to: 1) the requirement of training datasets, and 2) the inapplicability to image generation tasks. To complement this growing line of work, we aim to investigate GAN distillation and present insights for achieving successful knowledge transfer.
[1] Hinton et al, “Distilling the Knowledge in a Neural Network”, NeurIPS Deep Learning and Representation Learning Workshop (2015).
[2] Karras et al, “Analyzing and Improving the Image Quality of StyleGAN”, arXiv:1912.04958 (2020).
[3] Addepalli et al, “DeGAN: Data-Enriching GAN for Retrieving Representative Samples”, AAAI (2020).
[4] Kurmi et al, “Domain Impression: A Source Data Free Domain Adaptation Method”, WACV (2021).
[5] Kundu et al, “Universal Source-Free Domain Adaptation”, CVPR (2020).
[6] Kundu et al, “Towards Inheritable Models for Open-Set Domain Adaptation”, CVPR (2020).
[7] Wang et al, “Adversarial Learning of Portable Student Networks”, AAAI (2018).
[8] Chen et al, “Distilling Portable Generative Adversarial Networks for Image Translation”, AAAI (2020).
[9] Wang et al, “KDGAN: Knowledge Distillation with Generative Adversarial Networks”, NeuRIPS (2018).
[10] Chang et al, “TinyGAN: Distilling BigGAN for Conditional Image Generation”, ACCV (2020).
[11] Aguinaldo et al, “Compressing GANs using Knowledge Distillation”, arXiv:1902.00159 (2019).
[12] Isola et al, “Image-to-Image Translation with Conditional Adversarial Nets”, CVPR (2017).
[13] Lin et al, “Anycost GANs for Interactive Image Synthesis and Editing”, CVPR (2021).
[14] Sankaranarayanan et al, “Generate To Adapt: Aligning Domains using Generative Adversarial Networks”, CVPR (2018).
[15] Li et al, “GAN Compression: Efficient Architectures for Interactive Conditional GANs”, CVPR (2020).
[16] Li et al, "Semantic relation preserving knowledge distillation for image-to-image translation", ECCV (2020).
[17] Lopes et al, “Data-Free Knowledge Distillation for Deep Neural Networks”, NeurIPS Workshop on Learning with Limited Data (2017).
[18] Wang et al, "MineGAN: effective knowledge transfer from GANs to target domains with few images", CVPR (2020)