Introduction:
I am currently working on this project with Dr. Adway Mitra from IIT Kharagpur. Variational Autoencoders (Kingma D, et.al 2013) are widely used for data generation. In general, they are good at generation of random data, however with the addition of some constraints we can control the data generation in a desired way. The decoder can be conditioned on several attributes based on which subsequently data can be generated. E.g. An image generation of a face conditioned on blond hair, no beard etc.
VAEs described in the original work had a constant variance for the decoder. This leaded to lack to variety in the generated images. (Rubkyin et.al 2020) introduced to keep the variance as a variable during the optimization process in order to get variety in the generated images. They labelled it as sigma-VAE.
Setup:
We setup the models using PyTorch framework and experiment on MNIST and CELEB-A datasets. The conditionals for MNIST dataset are the labels. For the Celeb-A dataset, we use a special attribute feature which consists of binary values of different facial traits. E.g. Male/Female, blonde hair/black hair, lipstick/non-lipstick etc. This allows us to generate images with single and multiple attributes.
The encoder and decoder both consists of a convolutional and de-convolutional architecture design respectively. The conditionals are scaled and concatenated with the input as well as the latent space as per the original objective of the conditional variational autoencoder.
Explainable Section:
We are currently working on Layer-Wise Relevance propagation to figure out the conditional attributes which are responsible for generation of certain images. Further the process of activation maximization is also thought to be used to observe the features which maximize the neuron activations causing to generate images. These methods can be compared to cross-attention maps between the generated images and attributes to see the difference between the explainable powers of the three methods.