Knowledge Transfer in
Generative Adversarial Nets
Best Viewed at https://sites.google.com/andrew.cmu.edu/multigan-distillation (CMU access)
Code available here
Knowledge Transfer has practical benefits
A recent trend has erupted wherein corporate organizations with access to internet-scale data and compute capability often propose large state-of-the-art models, without releasing the proprietary training dataset. Such models are often difficult to reproduce for an end-user due to inadequate data, insufficient computational resources, or lack of domain knowledge. Moreover, deploying such large state-of-the-art models requires sacrificing the computational budget. To this end, a spectrum of Transfer Learning tools [1, 3, 5, 6, 9] has been proposed that enables reusing publicly available pre-trained models to efficiently train smaller models for the tasks at hand.
Figure 1. Knowledge transfer has practical benefits
There can be several scenarios when one could consider Transfer Learning across GANs. For e.g., as shown in Fig. 1.
Limited Data: When there is limited data for training, a (pre-trained) source GAN could be used to bootstrap the training of a target GAN, with the hope that better initialization would mitigate issues such as mode collapse.
Multiple GANs: One could also fuse multiple GANs into a single GAN that generates the combined distribution if we transfer knowledge successfully.
Incremental Data: The ability to inherit knowledge can aid in incremental training of GANs, for e.g. by adding more modalities to the generator's distribution.
GANs can be distilled
While the Knowledge Distillation [1] framework has been extensively explored for classification (and regression) tasks, very few works study distillation for image generation [7, 8, 10]. Although Generative Adversarial Nets have shown promising results [2, 12], computational constraints prevent their deployment on mobile devices or for interactive image editing [13] use-cases. It is therefore highly practical to consider distillation for such models - to train simple GANs by transferring the inductive bias captured by large pre-trained GANs.
Intuitively, GAN distillation is a well-defined task because a simple brute-force approach could be to generate a large collection of images using multiple GANs, and subsequently train a single GAN on this crafted dataset. However, this would require an arbitrarily large amount of generated data to truly model the image distribution and mitigate the effect of sampling bias. Moreover, conventional distillation techniques such as euclidean matching do not work well in the image space because such methods fail to capture the spatial structure of images [8].
Goal
In this work, we would like to study an effective solution to distill across GANs. This has several practical considerations: one can learn smaller deployment-friendly GANs by distillation, or, compress multiple datasets into a single GAN, or even, enrich the learning of a target GAN by enriching the training dataset via synthetic augmentation.
References
[1] Hinton et al, “Distilling the Knowledge in a Neural Network”, NeurIPS Deep Learning and Representation Learning Workshop (2015).
[2] Karras et al, “Analyzing and Improving the Image Quality of StyleGAN”, arXiv:1912.04958 (2020).
[3] Addepalli et al, “DeGAN: Data-Enriching GAN for Retrieving Representative Samples”, AAAI (2020).
[4] Kurmi et al, “Domain Impression: A Source Data Free Domain Adaptation Method”, WACV (2021).
[5] Kundu et al, “Universal Source-Free Domain Adaptation”, CVPR (2020).
[6] Kundu et al, “Towards Inheritable Models for Open-Set Domain Adaptation”, CVPR (2020).
[7] Wang et al, “Adversarial Learning of Portable Student Networks”, AAAI (2018).
[8] Chen et al, “Distilling Portable Generative Adversarial Networks for Image Translation”, AAAI (2020).
[9] Wang et al, “KDGAN: Knowledge Distillation with Generative Adversarial Networks”, NeuRIPS (2018).
[10] Chang et al, “TinyGAN: Distilling BigGAN for Conditional Image Generation”, ACCV (2020).
[11] Aguinaldo et al, “Compressing GANs using Knowledge Distillation”, arXiv:1902.00159 (2019).
[12] Isola et al, “Image-to-Image Translation with Conditional Adversarial Nets”, CVPR (2017).
[13] Lin et al, “Anycost GANs for Interactive Image Synthesis and Editing”, CVPR (2021).
[14] Sankaranarayanan et al, “Generate To Adapt: Aligning Domains using Generative Adversarial Networks”, CVPR (2018).
[15] Li et al, “GAN Compression: Efficient Architectures for Interactive Conditional GANs”, CVPR (2020).
[16] Li et al, "Semantic relation preserving knowledge distillation for image-to-image translation", ECCV (2020).
[17] Lopes et al, “Data-Free Knowledge Distillation for Deep Neural Networks”, NeurIPS Workshop on Learning with Limited Data (2017).
[18] Wang et al, "MineGAN: effective knowledge transfer from GANs to target domains with few images", CVPR (2020)