ULMFiT: Universal Language Model Fine-Tuning for Text Classification
Main Article Content
Abstract
While inductive transfer learning has revolutionized computer vision, current approaches to natural language processing still need training from the ground up and task-specific adjustments. As a powerful transfer learning approach applicable to any NLP activity, we provide Universal Language Model Fine-tuning (ULMFiT) and outline essential strategies for language model fine-tuning. With an error reduction of 18–24% on most datasets, our technique considerably surpasses the state-of-the-art on six text categorization tasks. Additionally, it achieves the same level of performance as training on 100 times more data with only 100 annotated examples. We have made our pretrained models and code publicly available.
Downloads
Article Details

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
How to Cite
References
Adeborna, Esi and Siau, Keng, "AN APPROACH TO SENTIMENT ANALYSIS –THE CASE OF AIRLINE QUALITY RATING"(2014). PACIS 2014 Proceedings. Paper 363. http://aisel.aisnet.org/pacis2014/363
Hung T. Vo, Hai C. Lam, Duc Dung Nguyen, Nguyen Huynh Tuong, “Topic classification and sentiment analysis for Vietnamese education survey system. (2016). In Asian Journal of Computer Science and Information Technology (Vol. 6, Issue 3). Innovative Journal. https://doi.org/10.15520/ajcsit.v6i3.44 .
Sarkar, S., Seal, T., & Bandyopadhyay, S. K. (2016). Sentiment analysis - An objective view. Journal for Research, 2(2), 26-29. https://www.researchgate.net/publication/328610677_Sentiment_Analysis-An_Objective_View
Joseph, Shenson and Joshi, Herat and Hassan, Md. Mehedi and Bairagi, Anupam Kumar, Advancing Quantum Machine Learning: From Theoretical Concepts to Experimental Implementations (July 01, 2024). Available at SSRN: https://ssrn.com/abstract=4946682 or http://dx.doi.org/10.2139/ssrn.4946682
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I. and Salakhutdinov, R. (2014). Dropout: A Simple Way to Prevent Neural Networks from Overfitting. J Mach Learn Res 15, 1929-1958. http://jmlr.org/papers/volume15/srivastava14a/srivastava14a.pdf
Joshi, H. (2022). Navigating the intersection of machine learning and healthcare: A review of current applications. International Journal of Advanced Research in Computer and Communication Engineering, 11(10). https://doi.org/10.17148/IJARCCE.2022.111016
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT Press. https://www.deeplearningbook.org/
Zagoruyko, S., & Komodakis, N. (2016). Wide Residual Networks. Proceedings of the British Machine Vision Conference 2016. doi:10.5244/c.30.87 DOI: https://dx.doi.org/10.5244/C.30.87
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE. https://doi.org/10.1109/cvpr.2016.90
Howard, J., & Ruder, S. (2018). Universal Language Model Fine-tuning for Text Classification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics. https://doi.org/10.18653/v1/p18-1031
Joshi, H., Joseph, S., & Shukla, P. (2024). Unlocking Potential. In Advances in Medical Technologies and Clinical Practice (pp. 313–341). IGI Global. https://doi.org/10.4018/979-8-3693-5893-1.ch016
M. E. Peters, W. Ammar, C. Bhagavatula, and R. Power, “Semisupervised sequence tagging with bidirectional language models,” arXiv preprint arXiv:1705.00108, 2017. https://doi.org/10.18653/v1/p17-1161
Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., & Darrell, T. (2013). DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition (Version 1). arXiv. https://doi.org/10.48550/ARXIV.1310.1531
Long, J., Shelhamer, E., & Darrell, T. (2014). Fully Convolutional Networks for Semantic Segmentation (Version 2). arXiv. https://doi.org/10.48550/ARXIV.1411.4038
Sarhan, I., & Spruit, M. (2020). Can We Survive without Labelled Data in NLP? Transfer Learning for Open Information Extraction. In Applied Sciences (Vol. 10, Issue 17, p. 5758). MDPI AG. https://doi.org/10.3390/app10175758
Deng, J., Dong, W., Socher, R., Li, L.-J., Kai Li, & Li Fei-Fei. (2009). ImageNet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition. 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPR Workshops). IEEE. https://doi.org/10.1109/cvpr.2009.5206848
Bryan McCann, James Bradbury, Caiming Xiong, and Richard Socher. Learned in Translation: Contextualized Word Vectors. In Advances in Neural Information Processing Systems, 2017. https://doi.org/10.48550/arXiv.1708.00107
McCann, B., Bradbury, J., Xiong, C., & Socher, R. (2017). Learned in Translation: Contextualized Word Vectors (Version 2). arXiv. https://doi.org/10.48550/ARXIV.1708.00107
Mou, L., Peng, H., Li, G., Xu, Y., Zhang, L., & Jin, Z. (2015). Discriminative Neural Sentence Modeling by Tree-Based Convolution (Version 5). arXiv. https://doi.org/10.48550/ARXIV.1504.01106 .
Zhou, P., Qi, Z., Zheng, S., Xu, J., Bao, H., & Xu, B. (2016). Text Classification Improved by Integrating Bidirectional LSTM with Two-dimensional Max Pooling (Version 1). arXiv. https://doi.org/10.48550/ARXIV.1611.06639
Zhang, X., Zhao, J., & LeCun, Y. (2015). Character-level Convolutional Networks for Text Classification (Version 3). arXiv. https://doi.org/10.48550/ARXIV.1509.01626
Johnson, R., & Zhang, T. (2016). Supervised and Semi-Supervised Text Categorization using LSTM for Region Embeddings (Version 2). arXiv. https://doi.org/10.48550/ARXIV.1602.02373
Johnson, R., & Zhang, T. (2017). Deep Pyramid Convolutional Neural Networks for Text Categorization. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics. https://doi.org/10.18653/v1/p17-1052 .
Patel, V. A., & Joshi, M. V. (2018). Convolutional neural network with transfer learning for rice type classification. In J. Zhou, P. Radeva, D. Nikolaev, & A. Verikas (Eds.), Tenth International Conference on Machine Vision (ICMV 2017). Tenth International Conference on Machine Vision (ICMV 2017). SPIE. https://doi.org/10.1117/12.2309482 .
Pennington, J., Socher, R., & Manning, C. (2014). Glove: Global Vectors for Word Representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics. https://doi.org/10.3115/v1/d14-1162 .
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., & Dean, J. (2013). Distributed Representations of Words and Phrases and their Compositionality (Version 1). arXiv. https://doi.org/10.48550/ARXIV.1310.4546 .
Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2016). Enriching Word Vectors with Subword Information (Version 2). arXiv. https://doi.org/10.48550/ARXIV.1607.04606 .
M. Rei, “Semi-supervised multitask learning for sequence labeling,” arXiv preprint https://arxiv.org/pdf/1704.07156
Joshi, H. (2024). Artificial Intelligence in Project Management: A Study of The Role of Ai-Powered Chatbots in Project Stakeholder Engagement. In Indian Journal of Software Engineering and Project Management (Vol. 4, Issue 1, pp. 20–25). Lattice Science Publication (LSP). https://doi.org/10.54105/ijsepm.b9022.04010124
McCann, B., Bradbury, J., Xiong, C., & Socher, R. (2017). Learned in Translation: Contextualized Word Vectors (Version 2). arXiv. https://doi.org/10.48550/ARXIV.1708.00107
Kumbhakarna, V. M., Kulkarni, S. B., & Dhawale, A. D. (2020). NLP Algorithms Endowed f or Automatic Extraction of Information from Unstructured Free Text Reports of Radiology Monarchy. In International Journal of Innovative Technology and Exploring Engineering (Vol. 9, Issue 12, pp. 338–343). https://doi.org/10.35940/ijitee.l8009.1091220
Chellatamilan, T., Valarmathi, B., & Santhi, K. (2020). Research Trends on Deep Transformation Neural Models for Text Analysis in NLP Applications. In International Journal of Recent Technology and Engineering (IJRTE) (Vol. 9, Issue 2, pp. 750–758). https://doi.org/10.35940/ijrte.b3838.079220
Hudaa, S., Setiyadi, D. B. P., Lydia, E. L., Shankar, K., Nguyen, P. T., Hashim, W., & Maseleno, A. (2019). Natural Language Processing utilization in Healthcare. In International Journal of Engineering and Advanced Technology (Vol. 8, Issue 6s2, pp. 1117–1120). https://doi.org/10.35940/ijeat.f1305.0886s219