An Overview of Text to Visual Generation Using GAN
Main Article Content
Abstract
Text-to-visual generation was once a cumbersome task until the advent of deep learning networks. With the introduction of deep learning, both images and videos can now be generated from textual descriptions. Deep learning networks have revolutionized various fields, including computer vision and natural language processing, with the emergence of Generative Adversarial Networks (GANs). GANs have played a significant role in advancing these domains. A GAN typically comprises multiple deep networks combined with other machine learning techniques. In the context of text-to-visual generation, GANs have enabled the synthesis of images and videos based on textual input. This work aims to explore different variations of GANs for image and video synthesis and propose a general architecture for textto-visual generation using GANs. Additionally, this study delves into the challenges associated with thistask and discusses ongoing research and future prospects. By leveraging the power of deep learning networks and GANs, the process of generating visual content from text has become more accessible and efficient. This work will contribute to the understanding and advancement of text-to-visual generation, paving the way for numerous applications across various industries.
Downloads
Article Details

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
How to Cite
References
I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley,
S. Ozair, A. Courville, and Y. Bengio, Generative adversarial nets, in Proc. 27th Int. Conf. Neural Information Processing Systems, Montreal, Canada, 2014, pp. 2672–2680.
Kingma, D. P., & Welling, M. (2013). Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114.
Gregor, K., Danihelka, I., Graves, A., Rezende, D., & Wierstra, D. (2015, June). Draw: A recurrent neural network for image generation. In International conference on machine learning (pp. 1462-1471). PMLR.
Huang, H., Yu, P. S., & Wang, C. (2018). An introduction to image syn- thesis with generative adversarial nets. arXiv preprint arXiv:1803.04469.
Agnese, J., Herrera, J., Tao, H., & Zhu, X. (2020). A survey and taxonomy of adversarial neural networks for text-to-image synthesis. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discov- ery, 10(4), e1345. https://doi.org/10.1002/widm.1345
Hayashi, M., Inoue, S., Douke, M., Hamaguchi, N., Kaneko, H., Bachelder, S., & Nakajima, M. (2014). T2v: New technology of con- verting text to cg animation. ITE Transactions on Media Technology and Applications, 2(1), 74-81. https://doi.org/10.3169/mta.2.74
Zhang, H., Xu, T., Li, H., Zhang, S., Wang, X., Huang, X., & Metaxas,
N. (2017). Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks. In Proceedings of the IEEE international conference on computer vision (pp. 5907-5915).
Kambhampati. Monica, Duvvada Rajeswara Rao,” Text to Image Trans- lation using Cycle GAN”, International Journal of Engineering and Advanced Technology (IJEAT) ISSN: 2249 – 8958 (Online), Volume-9 Issue-4, April 2020 https://doi.org/10.35940/ijeat.D8703.049420
Tulyakov, S., Liu, M. Y., Yang, X., & Kautz, J. (2018). Mocogan: Decomposing motion and content for video generation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1526-1535). https://doi.org/10.1109/CVPR.2018.00165
Saito, M., Matsumoto, E., & Saito, S. (2017). Temporal generative adversarial nets with singular value clipping. In Proceedings of the IEEE international conference on computer vision (pp. 2830-2839). https://doi.org/10.1109/ICCV.2017.308
Vondrick, C., Pirsiavash, H., & Torralba, A. (2016). Generating videos with scene dynamics. Advances in neural information processing sys- tems, 29.
Balaji, Y., Min, M. R., Bai, B., Chellappa, R., & Graf, H. P. (2019, August). Conditional GAN with Discriminative Filter Generation for Text-to-Video Synthesis. In IJCAI (Vol. 1, No. 2019, p. 2). https://doi.org/10.24963/ijcai.2019/276
Zhang, H., Xu, T., Li, H., Zhang, S., Wang, X., Huang, X., & Metaxas,
N. (2018). Stackgan++: Realistic image synthesis with stacked generative adversarial networks. IEEE transactions on pattern analysis and machine intelligence, 41(8), 1947-1962. https://doi.org/10.1109/TPAMI.2018.2856256
Xu, T., Zhang, P., Huang, Q., Zhang, H., Gan, Z., Huang, X., & He, X. (2018). Attngan: Fine-grained text to image generation with attentional generative adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1316-1324). 1316–1324. https://doi.org/10.1109/CVPR.2018.00143
Ak, K. E., Lim, J. H., Tham, J. Y., & Kassim, A. A. (2020). Semantically consistent text to fashion image synthesis with an enhanced attentional generative adversarial network. Pattern Recognition Letters, 135, 22-29. https://doi.org/10.1016/j.patrec.2020.02.030
Qiao, T., Zhang, J., Xu, D., & Tao, D. (2019). Mirrorgan: Learning text- to-image generation by redescription. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 1505- 1514). https://doi.org/10.1109/CVPR.2019.00160
Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., & Lee,
H. (2016, June). Generative adversarial text to image synthesis. In International conference on machine learning (pp. 1060-1069). PMLR.
Zhu, M., Pan, P., Chen, W., & Yang, Y. (2019). Dm-gan: Dynamic memory generative adversarial networks for text-to-image synthesis. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 5802-5810). https://doi.org/10.1109/CVPR.2019.00595
Kim, D., Joo, D., & Kim, J. (2020). Tivgan: Text to image to video generation with step-by-step evolutionary generator. IEEE Access, 8, 153113-153122. https://doi.org/10.1109/ACCESS.2020.3017881
Crowson, K., Biderman, S., Kornis, D., Stander, D., Hallahan, E., Cas- tricato, L., & Raff, E. (2022, October). Vqgan-clip: Open domain image generation and editing with natural language guidance. In European Conference on Computer Vision (pp. 88-105). Cham: Springer Nature Switzerland. https://doi.org/10.1007/978-3-031-19836-6_6
Li, B., Qi, X., Lukasiewicz, T., & Torr, P. H. (2020). Manigan: Text- guided image manipulation. In Proceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (pp. 7880-7889). https://doi.org/10.1109/CVPR42600.2020.00790
Karthika*, N., Janet, B., & Shukla, H. (2019). A Novel Deep Neural Network Model for Image Classification. In International Journal of Engineering and Advanced Technology (Vol. 8, Issue 6, pp. 3241–3249). https://doi.org/10.35940/ijeat.f8832.088619
Sumanth, A. G., R. Hema, Sumanth, R. H., Chowdary, A. C. V., Shashank, A., & Sravan, T. (2020). Real Time Image Captaioning. In International Journal of Innovative Technology and Exploring Engineering (Vol. 9, Issue 6, pp. 1707–1709). https://doi.org/10.35940/ijitee.f4566.049620
Bai, D. M. R., Sreedevi, Mrs. J., & Pragna, Ms. B. (2020). Enhanced Unsupervised Image Generation using GAN based Convolutional Nets. In International Journal of Recent Technology and Engineering (IJRTE) (Vol. 8, Issue 6, pp. 5312–5316). https://doi.org/10.35940/ijrte.f9856.038620
Radhamani, V., & Dalin, G. (2019). Significance of Artificial Intelligence and Machine Learning Techniques in Smart Cloud Computing: A Review. In International Journal of Soft Computing and Engineering (Vol. 9, Issue 3, pp. 1–7). https://doi.org/10.35940/ijsce.c3265.099319
Nair, V. K., Jose, R. R., Anil, P. B., Tom, M., & P.L., L. (2020). Automation of Cricket Scoreboard by Recognizing Umpire Gestures. In International Journal of Innovative Science and Modern Engineering (Vol. 6, Issue 7, pp. 1–7). https://doi.org/10.35940/ijisme.g1235.056720