Description

Upon recent success in the image generation and natural language processing, it enables the possibility to improve the text-to-image generation problems. For example, DM-GAN [1] achieves high R-Precision but Frechet Inception Distance (FID) and Inception Score (IS) are still questionable compared to a convention generative adversarial network (GAN) trained on the same dataset. They also lack the flexibility of controlling object layouts. In the context, this project aims to generate high-resolution photo-realistic images from text descriptions by using the GAN and primarily considering CoCo2014[2] and Cub-200-2011[3] databases. Initial model will be developed at a smaller scale with a lower resolution target and extend it further on large data sets in a convenient manner.

  

Background Literature

[1] Zhu et al. DM-GAN: Dynamic Memory Generative Adversarial Networks for Text-to-Image Synthesis, IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5802-5810, 2019.

[2] Lin et al. Microsoft COCO: Common Objects in Context, arXiv 2015. available: https://arxiv.org/pdf/1405.0312.pdf

[3] Caltech-UCSD Birds-200-2011, available: http://www.vision.caltech.edu/visipedia/CUB-200-20...