GANs [2] have been instrumental in various applications of Image-to-Image translation where a high dimensional input needs to be translated to a high dimensional input. The GANs play a minimax game where discriminator needs to signal out the fake images from real images. Therefore, these networks try to generate images which look similar to the distribution on which GANs were trained. Due to this, we are using GANs as our base model architecture. We will be varying the Generator architecture and report the results.
We consider the total loss to be minimized as a sum of Adversarial loss defined in [5] and another Image Similarity Loss which we will define below.
Total Loss = Adversarial Loss + Lambda * (Image_Similarity_Loss)
Image Similarity Loss = Per-Pixel Loss + Perceptual Loss
Per-Pixel loss will be calculated using L1 norm or L2 norm.
Perceptual Loss will be calculated using L1 norm or L2 norm.
To find the perpetual loss, we compare the features extracted from a pre-trained network (VGG16) on the ground truth and our generated image.
We will go over the details of each aspect of our study one by one.
For evaluating the results of our approaches, we will use the following metrics-
1. Turing test –
In this evaluation method, we will present the observers with a set of images and seek their opinion on which are generated as opposed to opposed to being real.
2. Closeness score –
In order to have a quantitative measure over the performance of our model, we propose to use L1 norm between the generated result and the ground truth, averaged over all the pixel positions.
Problems of Image to Image translation such as Image Colorization have multiple appropriate solutions. It is possible that a color scheme is chosen by the final model which appears much realistic but is far from the ground truth. Therefore, closeness score is not an appropriate measure of performance. However considering that we need a quantitative metric for our analysis, we will use this score for all the experiments.
We are using the 17 Flowers dataset available at this link.
For all our experiments, we are keeping the following hyperparameters constant.
ML Architecture : Generative adversarial networks(GANs)
Epochs : 100
Learning rate : 0.0002
Learning rate decay schedule : Linearly decay learning rate from 0.002 to 0 between 50 and 100 epochs
Validation Set Size : 20% of training set
Batch Size : 10
Leaky Relu Threshold : 0.2
Beta for Adam Optimizer : (0.5, 0.999)
Number of Residual blocks in Resnet : 6
Loss functions used in training the model [15]
Generator Architecture [9] [11]
Fusion of features from pre-trained classifier [6]
Lambda value
Model complexity in terms of minimum number of kernels