An Overview of Text to Visual Generation Using GAN

Sibi Mathew

doi:10.54105/ijipr.A8041.04030424

PDF

Published: May 30, 2024

DOI: https://doi.org/10.54105/ijipr.A8041.04030424

Keywords:

Computer Vision, Image Synthesis, Natural Lan- Guage Processing, Video Synthesis, Filler Images

Sibi Mathew

MTech Scholar, Department of CSE TKM College of Engineering Kollam, Kerala, India.

https://orcid.org/0009-0001-7227-8082

Abstract

Text-to-visual generation was once a cumbersome task until the advent of deep learning networks. With the introduction of deep learning, both images and videos can now be generated from textual descriptions. Deep learning networks have revolutionized various fields, including computer vision and natural language processing, with the emergence of Generative Adversarial Networks (GANs). GANs have played a significant role in advancing these domains. A GAN typically comprises multiple deep networks combined with other machine learning techniques. In the context of text-to-visual generation, GANs have enabled the synthesis of images and videos based on textual input. This work aims to explore different variations of GANs for image and video synthesis and propose a general architecture for textto-visual generation using GANs. Additionally, this study delves into the challenges associated with thistask and discusses ongoing research and future prospects. By leveraging the power of deep learning networks and GANs, the process of generating visual content from text has become more accessible and efficient. This work will contribute to the understanding and advancement of text-to-visual generation, paving the way for numerous applications across various industries.

Downloads

Download data is not yet available.

How to Cite

[1]

Sibi Mathew , Tran., “An Overview of Text to Visual Generation Using GAN”, IJIPR, vol. 4, no. 3, pp. 1–9, May 2024, doi: 10.54105/ijipr.A8041.04030424.

Issue

Vol. 4 No. 3 (2024): Volume-4 Issue-3, April 2024

Section

Articles

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

CC-BY-NC-ND 4.0

How to Cite

[1]

Sibi Mathew , Tran., “An Overview of Text to Visual Generation Using GAN”, IJIPR, vol. 4, no. 3, pp. 1–9, May 2024, doi: 10.54105/ijipr.A8041.04030424.

Download Citation

References

I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley,

S. Ozair, A. Courville, and Y. Bengio, Generative adversarial nets, in Proc. 27th Int. Conf. Neural Information Processing Systems, Montreal, Canada, 2014, pp. 2672–2680.

Kingma, D. P., & Welling, M. (2013). Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114.

Gregor, K., Danihelka, I., Graves, A., Rezende, D., & Wierstra, D. (2015, June). Draw: A recurrent neural network for image generation. In International conference on machine learning (pp. 1462-1471). PMLR.

Huang, H., Yu, P. S., & Wang, C. (2018). An introduction to image syn- thesis with generative adversarial nets. arXiv preprint arXiv:1803.04469.

Agnese, J., Herrera, J., Tao, H., & Zhu, X. (2020). A survey and taxonomy of adversarial neural networks for text-to-image synthesis. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discov- ery, 10(4), e1345. https://doi.org/10.1002/widm.1345

Hayashi, M., Inoue, S., Douke, M., Hamaguchi, N., Kaneko, H., Bachelder, S., & Nakajima, M. (2014). T2v: New technology of con- verting text to cg animation. ITE Transactions on Media Technology and Applications, 2(1), 74-81. https://doi.org/10.3169/mta.2.74

Zhang, H., Xu, T., Li, H., Zhang, S., Wang, X., Huang, X., & Metaxas,

N. (2017). Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks. In Proceedings of the IEEE international conference on computer vision (pp. 5907-5915).

Kambhampati. Monica, Duvvada Rajeswara Rao,” Text to Image Trans- lation using Cycle GAN”, International Journal of Engineering and Advanced Technology (IJEAT) ISSN: 2249 – 8958 (Online), Volume-9 Issue-4, April 2020 https://doi.org/10.35940/ijeat.D8703.049420

Tulyakov, S., Liu, M. Y., Yang, X., & Kautz, J. (2018). Mocogan: Decomposing motion and content for video generation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1526-1535). https://doi.org/10.1109/CVPR.2018.00165

Saito, M., Matsumoto, E., & Saito, S. (2017). Temporal generative adversarial nets with singular value clipping. In Proceedings of the IEEE international conference on computer vision (pp. 2830-2839). https://doi.org/10.1109/ICCV.2017.308

Vondrick, C., Pirsiavash, H., & Torralba, A. (2016). Generating videos with scene dynamics. Advances in neural information processing sys- tems, 29.

Balaji, Y., Min, M. R., Bai, B., Chellappa, R., & Graf, H. P. (2019, August). Conditional GAN with Discriminative Filter Generation for Text-to-Video Synthesis. In IJCAI (Vol. 1, No. 2019, p. 2). https://doi.org/10.24963/ijcai.2019/276

Zhang, H., Xu, T., Li, H., Zhang, S., Wang, X., Huang, X., & Metaxas,

N. (2018). Stackgan++: Realistic image synthesis with stacked generative adversarial networks. IEEE transactions on pattern analysis and machine intelligence, 41(8), 1947-1962. https://doi.org/10.1109/TPAMI.2018.2856256

Xu, T., Zhang, P., Huang, Q., Zhang, H., Gan, Z., Huang, X., & He, X. (2018). Attngan: Fine-grained text to image generation with attentional generative adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1316-1324). 1316–1324. https://doi.org/10.1109/CVPR.2018.00143

Ak, K. E., Lim, J. H., Tham, J. Y., & Kassim, A. A. (2020). Semantically consistent text to fashion image synthesis with an enhanced attentional generative adversarial network. Pattern Recognition Letters, 135, 22-29. https://doi.org/10.1016/j.patrec.2020.02.030

Qiao, T., Zhang, J., Xu, D., & Tao, D. (2019). Mirrorgan: Learning text- to-image generation by redescription. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 1505- 1514). https://doi.org/10.1109/CVPR.2019.00160

Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., & Lee,

H. (2016, June). Generative adversarial text to image synthesis. In International conference on machine learning (pp. 1060-1069). PMLR.

Zhu, M., Pan, P., Chen, W., & Yang, Y. (2019). Dm-gan: Dynamic memory generative adversarial networks for text-to-image synthesis. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 5802-5810). https://doi.org/10.1109/CVPR.2019.00595

Kim, D., Joo, D., & Kim, J. (2020). Tivgan: Text to image to video generation with step-by-step evolutionary generator. IEEE Access, 8, 153113-153122. https://doi.org/10.1109/ACCESS.2020.3017881

Crowson, K., Biderman, S., Kornis, D., Stander, D., Hallahan, E., Cas- tricato, L., & Raff, E. (2022, October). Vqgan-clip: Open domain image generation and editing with natural language guidance. In European Conference on Computer Vision (pp. 88-105). Cham: Springer Nature Switzerland. https://doi.org/10.1007/978-3-031-19836-6_6

Li, B., Qi, X., Lukasiewicz, T., & Torr, P. H. (2020). Manigan: Text- guided image manipulation. In Proceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (pp. 7880-7889). https://doi.org/10.1109/CVPR42600.2020.00790

Karthika*, N., Janet, B., & Shukla, H. (2019). A Novel Deep Neural Network Model for Image Classification. In International Journal of Engineering and Advanced Technology (Vol. 8, Issue 6, pp. 3241–3249). https://doi.org/10.35940/ijeat.f8832.088619

Sumanth, A. G., R. Hema, Sumanth, R. H., Chowdary, A. C. V., Shashank, A., & Sravan, T. (2020). Real Time Image Captaioning. In International Journal of Innovative Technology and Exploring Engineering (Vol. 9, Issue 6, pp. 1707–1709). https://doi.org/10.35940/ijitee.f4566.049620

Bai, D. M. R., Sreedevi, Mrs. J., & Pragna, Ms. B. (2020). Enhanced Unsupervised Image Generation using GAN based Convolutional Nets. In International Journal of Recent Technology and Engineering (IJRTE) (Vol. 8, Issue 6, pp. 5312–5316). https://doi.org/10.35940/ijrte.f9856.038620

Radhamani, V., & Dalin, G. (2019). Significance of Artificial Intelligence and Machine Learning Techniques in Smart Cloud Computing: A Review. In International Journal of Soft Computing and Engineering (Vol. 9, Issue 3, pp. 1–7). https://doi.org/10.35940/ijsce.c3265.099319

Nair, V. K., Jose, R. R., Anil, P. B., Tom, M., & P.L., L. (2020). Automation of Cricket Scoreboard by Recognizing Umpire Gestures. In International Journal of Innovative Science and Modern Engineering (Vol. 6, Issue 7, pp. 1–7). https://doi.org/10.35940/ijisme.g1235.056720

Article Sidebar

Main Article Content

Abstract

Downloads

Article Details

How to Cite

References

Most read articles by the same author(s)