Zakir Hossain - Project 4

Deep Learning for Generating Images from Texts

Back to Projects page

Description

Upon recent success in the image generation and natural language processing, it enables the possibility to improve the text-to-image generation problems. For example, DM-GAN [1] achieves high R-Precision but Frechet Inception Distance (FID) and Inception Score (IS) are still questionable compared to a convention generative adversarial network (GAN) trained on the same dataset. They also lack the flexibility of controlling object layouts. In the context, this project aims to generate high-resolution photo-realistic images from text descriptions by using the GAN and primarily considering CoCo2014[2] and Cub-200-2011[3] databases. Initial model will be developed at a smaller scale with a lower resolution target and extend it further on large data sets in a convenient manner.

Background Literature

[1] Zhu et al. DM-GAN: Dynamic Memory Generative Adversarial Networks for Text-to-Image Synthesis, IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5802-5810, 2019.

[2] Lin et al. Microsoft COCO: Common Objects in Context, arXiv 2015. available: https://arxiv.org/pdf/1405.0312.pdf

[3] Caltech-UCSD Birds-200-2011, available: http://www.vision.caltech.edu/visipedia/CUB-200-20...

Page updated

Google Sites

Report abuse