Link to the tutorial => https://medium.com/@ekss1121/generative-adversarial-networks-b9f80e6d7679
Layer (type) Output Shape Param # =================================================================input_1 (InputLayer) (None, 100) 0 _________________________________________________________________dense_1 (Dense) (None, 6272) 633472 _________________________________________________________________leaky_re_lu_1 (LeakyReLU) (None, 6272) 0 _________________________________________________________________reshape_1 (Reshape) (None, 7, 7, 128) 0 _________________________________________________________________up_sampling2d_1 (UpSampling2 (None, 14, 14, 128) 0 _________________________________________________________________conv2d_1 (Conv2D) (None, 14, 14, 64) 204864 _________________________________________________________________leaky_re_lu_2 (LeakyReLU) (None, 14, 14, 64) 0 _________________________________________________________________up_sampling2d_2 (UpSampling2 (None, 28, 28, 64) 0 _________________________________________________________________conv2d_2 (Conv2D) (None, 28, 28, 1) 1601 _________________________________________________________________activation_1 (Activation) (None, 28, 28, 1) 0 =================================================================__________________________________________________________________________________________________________________________________Layer (type) Output Shape Param # =================================================================input_2 (InputLayer) (None, 28, 28, 1) 0 _________________________________________________________________conv2d_3 (Conv2D) (None, 14, 14, 64) 1664 _________________________________________________________________leaky_re_lu_3 (LeakyReLU) (None, 14, 14, 64) 0 _________________________________________________________________dropout_1 (Dropout) (None, 14, 14, 64) 0 _________________________________________________________________conv2d_4 (Conv2D) (None, 7, 7, 128) 204928 _________________________________________________________________leaky_re_lu_4 (LeakyReLU) (None, 7, 7, 128) 0 _________________________________________________________________dropout_2 (Dropout) (None, 7, 7, 128) 0 _________________________________________________________________flatten_1 (Flatten) (None, 6272) 0 _________________________________________________________________dense_2 (Dense) (None, 2) 12546 =================================================================_________________________________________________________________Link => https://arxiv.org/pdf/1611.07004.pdf
Requirements
1. Put conditions on the output, hence use a input image to be converted from one font to another
2. Produce font conversion from one word to another
Hence we move to the pix2pix model which use a encoder-decoder network to get the input and produce output in different font
Why use gans(according to the paper)?
Minimizing euclidean distance between predicted and ground truth produces blurry images because it minimizes by averaging all plausible outputs, but we need sharp and realistic images.
Hence we use gans who's goal is to "make the output in distinguish-able from reality”(blurry images will be considered fake)
(0): Conv2d(6, 64, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
(1): LeakyReLU(0.2, inplace) (2): Conv2d(64, 128, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False) (3): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True) (4): LeakyReLU(0.2, inplace) (5): Conv2d(128, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False) (6): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True) (7): LeakyReLU(0.2, inplace) (8): Conv2d(256, 512, kernel_size=(4, 4), stride=(1, 1), padding=(1, 1), bias=False) (9): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True) (10): LeakyReLU(0.2, inplace) (11): Conv2d(512, 1, kernel_size=(4, 4), stride=(1, 1), padding=(1, 1)) (12): Sigmoid() ))About the architecture
Link => https://arxiv.org/abs/1703.10593
Why use Cycle Gan?
Cyclic consistency loss = Ex∼pdata(x) [F(G(x)) − x] +Ey∼pdata(y) [G(F(y)) − y].
3. Unlike pix2pix which required paired images, it does not require them but will use unpaired images of both sets(words of different font).
About the architecture
TODO:
simGAN
More results can be found on this link -> https://web.iiit.ac.in/~shubh.maheshwari/test_12/index.html
We did the same experiment on Cycle Gans using a much larger data-set.
One of the reasons can be =>
We also think that this model is not good fit to change the shape of object. We tried to run the model for converting a men's face to a look alike women's face. For that we used celebA dataset but the results are not good and images produced are quite distorted.
https://hardikbansal.github.io/CycleGANBlog/
4. The training data-set and test data-sets where all unique, no words were repeated.
To understand the issue of capital letter conversion we ran the same experiment but only for converting from one font to another.
Results are not as promising as the for small letters but its a start.
more results => https://web.iiit.ac.in/~shubh.maheshwari/test_latest/index.html
1. One way to do this is is by using z to generate B using A and then use an encoder to get z^ (z -> B -> z^).
2. Now using an L1 loss together with loss for G and D. Figure (d), we can enforce the noise.
1. We can reverse this by using E on B to produce Q(z/B) then using this and G generate B^. (B -> z -> B^)
2 . Authors noted that this method alone didn't provide the good results at test time. Hence they enforced KL-divergence on Q(z/B) and z(random Gaussian ) to produce much better results
3. Look at figure c for reference.
If the video doesn't load => https://drive.google.com/open?id=1KrjMRw4oQC1wnEEsFYzpj5DMf7_xmDh7
More results => https://web.iiit.ac.in/~shubh.maheshwari/test_bicycle
- We can see a case of mode collapse as their is very less variance in the generated images
Report => https://web.iiit.ac.in/~shubh.maheshwari/Visualize_fonts.html
Currently I have just evaluated using PCA.
please recommend more evaluation methods for image comparisons
Link to the meeting -> https://drive.google.com/file/d/1KbcyuZCA9U-ZySEsXovowWJK1NSju5Jh/view
1. Training the cycle on a disjoint and a larger data-set
2 . Training on single to multiple fonts
3. Variable length
4. Class Information should be preserved (eg. spelling)
5. We need some kind of qualitative method to analyse how good the generated fonts are in general(not for a particular font)
1. We could try to train an ocr together with the gan for analysing how close is the generator to the required target space
2. We could use an already good trained OCR to predict the labels. Wrong labels and less confident labels would get a bad score
6. Analyse the training data, whether we have a bias during training
7. Use a trimmed images for font