ULMFiT: Universal Language Model Fine-Tuning for Text Classification

Main Article Content

Herat Joshi
Shenson Joseph

Abstract

While inductive transfer learning has revolutionized computer vision, current approaches to natural language processing still need training from the ground up and task-specific adjustments. As a powerful transfer learning approach applicable to any NLP activity, we provide Universal Language Model Fine-tuning (ULMFiT) and outline essential strategies for language model fine-tuning. With an error reduction of 18–24% on most datasets, our technique considerably surpasses the state-of-the-art on six text categorization tasks. Additionally, it achieves the same level of performance as training on 100 times more data with only 100 annotated examples. We have made our pretrained models and code publicly available.

Downloads

Download data is not yet available.

Article Details

How to Cite
[1]
Herat Joshi and Shenson Joseph , Trans., “ULMFiT: Universal Language Model Fine-Tuning for Text Classification”, IJAMST, vol. 4, no. 6, pp. 1–9, Oct. 2024, doi: 10.54105/ijamst.E3049.04061024.
Section
Articles

How to Cite

[1]
Herat Joshi and Shenson Joseph , Trans., “ULMFiT: Universal Language Model Fine-Tuning for Text Classification”, IJAMST, vol. 4, no. 6, pp. 1–9, Oct. 2024, doi: 10.54105/ijamst.E3049.04061024.

References

Adeborna, Esi and Siau, Keng, "AN APPROACH TO SENTIMENT ANALYSIS –THE CASE OF AIRLINE QUALITY RATING"(2014). PACIS 2014 Proceedings. Paper 363. http://aisel.aisnet.org/pacis2014/363

Hung T. Vo, Hai C. Lam, Duc Dung Nguyen, Nguyen Huynh Tuong, “Topic classification and sentiment analysis for Vietnamese education survey system. (2016). In Asian Journal of Computer Science and Information Technology (Vol. 6, Issue 3). Innovative Journal. https://doi.org/10.15520/ajcsit.v6i3.44 .

Sarkar, S., Seal, T., & Bandyopadhyay, S. K. (2016). Sentiment analysis - An objective view. Journal for Research, 2(2), 26-29. https://www.researchgate.net/publication/328610677_Sentiment_Analysis-An_Objective_View

Joseph, Shenson and Joshi, Herat and Hassan, Md. Mehedi and Bairagi, Anupam Kumar, Advancing Quantum Machine Learning: From Theoretical Concepts to Experimental Implementations (July 01, 2024). Available at SSRN: https://ssrn.com/abstract=4946682 or http://dx.doi.org/10.2139/ssrn.4946682

Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I. and Salakhutdinov, R. (2014). Dropout: A Simple Way to Prevent Neural Networks from Overfitting. J Mach Learn Res 15, 1929-1958. http://jmlr.org/papers/volume15/srivastava14a/srivastava14a.pdf

Joshi, H. (2022). Navigating the intersection of machine learning and healthcare: A review of current applications. International Journal of Advanced Research in Computer and Communication Engineering, 11(10). https://doi.org/10.17148/IJARCCE.2022.111016

Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT Press. https://www.deeplearningbook.org/

Zagoruyko, S., & Komodakis, N. (2016). Wide Residual Networks. Proceedings of the British Machine Vision Conference 2016. doi:10.5244/c.30.87 DOI: https://dx.doi.org/10.5244/C.30.87

He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE. https://doi.org/10.1109/cvpr.2016.90

Howard, J., & Ruder, S. (2018). Universal Language Model Fine-tuning for Text Classification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics. https://doi.org/10.18653/v1/p18-1031

Joshi, H., Joseph, S., & Shukla, P. (2024). Unlocking Potential. In Advances in Medical Technologies and Clinical Practice (pp. 313–341). IGI Global. https://doi.org/10.4018/979-8-3693-5893-1.ch016

M. E. Peters, W. Ammar, C. Bhagavatula, and R. Power, “Semisupervised sequence tagging with bidirectional language models,” arXiv preprint arXiv:1705.00108, 2017. https://doi.org/10.18653/v1/p17-1161

Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., & Darrell, T. (2013). DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition (Version 1). arXiv. https://doi.org/10.48550/ARXIV.1310.1531

Long, J., Shelhamer, E., & Darrell, T. (2014). Fully Convolutional Networks for Semantic Segmentation (Version 2). arXiv. https://doi.org/10.48550/ARXIV.1411.4038

Sarhan, I., & Spruit, M. (2020). Can We Survive without Labelled Data in NLP? Transfer Learning for Open Information Extraction. In Applied Sciences (Vol. 10, Issue 17, p. 5758). MDPI AG. https://doi.org/10.3390/app10175758

Deng, J., Dong, W., Socher, R., Li, L.-J., Kai Li, & Li Fei-Fei. (2009). ImageNet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition. 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPR Workshops). IEEE. https://doi.org/10.1109/cvpr.2009.5206848

Bryan McCann, James Bradbury, Caiming Xiong, and Richard Socher. Learned in Translation: Contextualized Word Vectors. In Advances in Neural Information Processing Systems, 2017. https://doi.org/10.48550/arXiv.1708.00107

McCann, B., Bradbury, J., Xiong, C., & Socher, R. (2017). Learned in Translation: Contextualized Word Vectors (Version 2). arXiv. https://doi.org/10.48550/ARXIV.1708.00107

Mou, L., Peng, H., Li, G., Xu, Y., Zhang, L., & Jin, Z. (2015). Discriminative Neural Sentence Modeling by Tree-Based Convolution (Version 5). arXiv. https://doi.org/10.48550/ARXIV.1504.01106 .

Zhou, P., Qi, Z., Zheng, S., Xu, J., Bao, H., & Xu, B. (2016). Text Classification Improved by Integrating Bidirectional LSTM with Two-dimensional Max Pooling (Version 1). arXiv. https://doi.org/10.48550/ARXIV.1611.06639

Zhang, X., Zhao, J., & LeCun, Y. (2015). Character-level Convolutional Networks for Text Classification (Version 3). arXiv. https://doi.org/10.48550/ARXIV.1509.01626

Johnson, R., & Zhang, T. (2016). Supervised and Semi-Supervised Text Categorization using LSTM for Region Embeddings (Version 2). arXiv. https://doi.org/10.48550/ARXIV.1602.02373

Johnson, R., & Zhang, T. (2017). Deep Pyramid Convolutional Neural Networks for Text Categorization. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics. https://doi.org/10.18653/v1/p17-1052 .

Patel, V. A., & Joshi, M. V. (2018). Convolutional neural network with transfer learning for rice type classification. In J. Zhou, P. Radeva, D. Nikolaev, & A. Verikas (Eds.), Tenth International Conference on Machine Vision (ICMV 2017). Tenth International Conference on Machine Vision (ICMV 2017). SPIE. https://doi.org/10.1117/12.2309482 .

Pennington, J., Socher, R., & Manning, C. (2014). Glove: Global Vectors for Word Representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics. https://doi.org/10.3115/v1/d14-1162 .

Mikolov, T., Sutskever, I., Chen, K., Corrado, G., & Dean, J. (2013). Distributed Representations of Words and Phrases and their Compositionality (Version 1). arXiv. https://doi.org/10.48550/ARXIV.1310.4546 .

Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2016). Enriching Word Vectors with Subword Information (Version 2). arXiv. https://doi.org/10.48550/ARXIV.1607.04606 .

M. Rei, “Semi-supervised multitask learning for sequence labeling,” arXiv preprint https://arxiv.org/pdf/1704.07156

Joshi, H. (2024). Artificial Intelligence in Project Management: A Study of The Role of Ai-Powered Chatbots in Project Stakeholder Engagement. In Indian Journal of Software Engineering and Project Management (Vol. 4, Issue 1, pp. 20–25). Lattice Science Publication (LSP). https://doi.org/10.54105/ijsepm.b9022.04010124

McCann, B., Bradbury, J., Xiong, C., & Socher, R. (2017). Learned in Translation: Contextualized Word Vectors (Version 2). arXiv. https://doi.org/10.48550/ARXIV.1708.00107

Kumbhakarna, V. M., Kulkarni, S. B., & Dhawale, A. D. (2020). NLP Algorithms Endowed f or Automatic Extraction of Information from Unstructured Free Text Reports of Radiology Monarchy. In International Journal of Innovative Technology and Exploring Engineering (Vol. 9, Issue 12, pp. 338–343). https://doi.org/10.35940/ijitee.l8009.1091220

Chellatamilan, T., Valarmathi, B., & Santhi, K. (2020). Research Trends on Deep Transformation Neural Models for Text Analysis in NLP Applications. In International Journal of Recent Technology and Engineering (IJRTE) (Vol. 9, Issue 2, pp. 750–758). https://doi.org/10.35940/ijrte.b3838.079220

Hudaa, S., Setiyadi, D. B. P., Lydia, E. L., Shankar, K., Nguyen, P. T., Hashim, W., & Maseleno, A. (2019). Natural Language Processing utilization in Healthcare. In International Journal of Engineering and Advanced Technology (Vol. 8, Issue 6s2, pp. 1117–1120). https://doi.org/10.35940/ijeat.f1305.0886s219

Most read articles by the same author(s)

1 2 3 > >>