Speech Emotion Recognition Using MFCC Features and LSTM-Based Deep Learning

Dr. Devraj; Dr. Ravindra Nath; Nikita Singh; Vibhushit Katiyar; Amber Srivastava

doi:10.54105/ijsp.A1017.06010226

PDF

Published: Feb 28, 2026

DOI: https://doi.org/10.54105/ijsp.A1017.06010226

Keywords:

Speech Emotion Recognition, MFCC, LSTM, Deep Learning, TESS Dataset, Human–Computer Interaction, Audio Signal Processing

Dr. Devraj

Department of Division of Social Science, ICAR-Indian Institute of Pulses Research, Kanpur (Uttar Pradesh), India.

https://orcid.org/0009-0005-4381-5933

Dr. Ravindra Nath

Associate Professor, Department of Computer Science, Babasaheb Bhimrao Ambedkar Central University, Lucknow (Uttar Pradesh), India.

Nikita Singh

Department of Computer Centre, ICAR-Indian Institute of Pulses Research, Kanpur (Uttar Pradesh), India.

Vibhushit Katiyar

Student, Department of Computer Science, B.Tech.(CS) Student, Lovely Professional University, Jalandhar (Panjab), India.

Amber Srivastava

Student, Department of Computer Science, B.Tech.(CS) Student, Lovely Professional University, Jalandhar (Panjab), India.

Abstract

Speech Emotion Recognition (SER) has emerged as a significant research area within Human–Computer Interaction (HCI), enabling intelligent systems to interpret human emotional states from spoken audio. Accurate emotion recognition from speech plays a crucial role in enhancing natural interaction between humans and machines. This paper presents a deep learning–based SER framework that combines Mel-Frequency Cepstral Coefficients (MFCC) for feature extraction with Long Short-Term Memory (LSTM) networks for temporal modelling and emotion classification. MFCC features effectively capture the spectral characteristics of speech signals, whereas LSTM networks are well-suited to modelling long-term temporal dependencies inherent in emotional speech patterns. The proposed model is trained and evaluated on the Toronto Emotional Speech Set (TESS) dataset, which covers multiple emotional categories, including happiness, sadness, anger, fear, and neutrality. Experimental results demonstrate that the proposed MFCC–LSTM approach achieves promising classification accuracy, indicating its effectiveness in recognising emotional states from speech signals. The findings highlight the potential applicability of the proposed system in real-world scenarios, including virtual assistants, call centre analytics, and mental health monitoring systems, thereby contributing to the development of emotion-aware intelligent interfaces.

Downloads

Download data is not yet available.

How to Cite

[1]

Dr. Devraj, Dr. Ravindra Nath, Nikita Singh, Vibhushit Katiyar, and Amber Srivastava , Trans., “Speech Emotion Recognition Using MFCC Features and LSTM-Based Deep Learning”, IJSP, vol. 6, no. 1, pp. 1–6, Feb. 2026, doi: 10.54105/ijsp.A1017.06010226.

Issue

Vol. 6 No. 1 (2026): Volume-6 Issue-1, February 2026

Section

Articles

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

CC-BY-NC-ND 4.0

How to Cite

[1]

Dr. Devraj, Dr. Ravindra Nath, Nikita Singh, Vibhushit Katiyar, and Amber Srivastava , Trans., “Speech Emotion Recognition Using MFCC Features and LSTM-Based Deep Learning”, IJSP, vol. 6, no. 1, pp. 1–6, Feb. 2026, doi: 10.54105/ijsp.A1017.06010226.

Download Citation

References

Trigeorgis, G., Ringeval, F., Brueckner, R., Marchi, E., Nicolaou, M. A., Schuller, B., & Zafeiriou, S. Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network. Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). 5200–5204; DOI: https://doi.org/10.1109/ICASSP.2016.7472669(2016).

Tzirakis, P., Zhang, J. & Schuller, B. End-to-end speech emotion recognition using deep neural networks. IEEE Journal of Selected Topics in Signal Processing. 11,8, 1301–1309; DOI: https://doi.org/10.1109/JSTSP.2017.2764438 (2017).

Latif, S., Qureshi, R., Usman, M., Qadir, J. & Bilal, S. Survey of deep representation learning for speech emotion recognition. IEEE Access. 8, 54709–54725; DOI: https://ieeexplore.ieee.org/document/9099481 (2020).

Fayek, H. M., Lech, M., & Cavedon, L. Evaluating deep learning architectures for speech emotion recognition. Neural Networks. 92, 60–68; https://doi.org/10.1016/j.neunet.2017.02.013 (2017).

Latif, S., Qureshi, R. & Qadir, J. Transfer learning for improving speech emotion recognition from low-resourced languages. Interspeech. DOI: https://arxiv.org/abs/1801.06353(2018).

Neumann, M. & Vu, N. T. Improving speech emotion recognition with unsupervised representation learning on unlabeled speech. IEEE Transactions on Affective Computing. 10, 4, 688–701; DOI: https://doi.org/10.1109/TAFFC.2017.2776664 (2019).

Kim, Y., Lee, H. & Provost, E. M. Speech emotion recognition using multi-hop attention mechanism. ICASSP.

DOI: https://doi.org/10.1109/ICASSP.2017.7952548 (2017).

Parthasarathy, S. & Busso, C. Cross-corpus speech emotion recognition: A review and analysis. Speech Communication. 116, 33–50; DOI: https://doi.org/10.1016/j.specom.2019.10.004 (2020).

Chen, Z., Li, W. & Xu, L. Cross-lingual speech emotion recognition with domain adaptation and self-training. IEEE Access. 9, 14600–14612; DOI: https://doi.org/10.1109/ACCESS.2021.3051529 (2021).

Deng, J., Xu, X. & Schuller, B. Speech emotion recognition using an enhanced convolutional recurrent neural network and data augmentation. ICASSP. DOI: https://doi.org/10.1109/ICASSP.2017.7952057 (2017).

Tripathi, S., Agrawal, S. & Chakraborty, J. Deep neural networks for speech emotion recognition using MFCC features. Procedia Computer Science. 152, 255–262; DOI: https://doi.org/10.1016/j.procs.2019.05.014 (2019).

Aldeneh, Z. & Provost, E. M. Using regional saliency for speech emotion recognition. IEEE Transactions on Affective Computing. 11, 2, 316–328; DOI: https://doi.org/10.1109/TAFFC.2017.2764139 (2020).

Latif, S., Rana, R., Younis, S., Qadir, J. & Epps, J. Multi-task semi-supervised adversarial autoencoding for speech emotion recognition. IEEE Transactions on Affective Computing. 11, 4, 559–571; DOI: https://doi.org/10.1109/TAFFC.2018.2874581 (2020).

Wen, S., Sun, Y., & Liu, G. Speech emotion recognition using transformer networks. IEEE Access. 9, 129293–129304;

DOI: https://doi.org/10.1109/ACCESS.2021.3100932 (2021).

Verma, R. & Tiwari, A. A survey on speech emotion recognition: Features, classification schemes, and datasets. Journal of Information and Optimisation Sciences. 42, 6, 1201–1224; DOI: http://dx.doi.org/10.1080/02522667.2020.1840756(2021)

Article Sidebar

Main Article Content

Abstract

Downloads

Article Details

How to Cite

References

Most read articles by the same author(s)