Speech Emotion Recognition Using MFCC Features and LSTM-Based Deep Learning

Main Article Content

Dr. Devraj
Dr. Ravindra Nath
Nikita Singh
Vibhushit Katiyar
Amber Srivastava

Abstract

Speech Emotion Recognition (SER) has emerged as a significant research area within Human–Computer Interaction (HCI), enabling intelligent systems to interpret human emotional states from spoken audio. Accurate emotion recognition from speech plays a crucial role in enhancing natural interaction between humans and machines. This paper presents a deep learning–based SER framework that combines Mel-Frequency Cepstral Coefficients (MFCC) for feature extraction with Long Short-Term Memory (LSTM) networks for temporal modelling and emotion classification. MFCC features effectively capture the spectral characteristics of speech signals, whereas LSTM networks are well-suited to modelling long-term temporal dependencies inherent in emotional speech patterns. The proposed model is trained and evaluated on the Toronto Emotional Speech Set (TESS) dataset, which covers multiple emotional categories, including happiness, sadness, anger, fear, and neutrality. Experimental results demonstrate that the proposed MFCC–LSTM approach achieves promising classification accuracy, indicating its effectiveness in recognising emotional states from speech signals. The findings highlight the potential applicability of the proposed system in real-world scenarios, including virtual assistants, call centre analytics, and mental health monitoring systems, thereby contributing to the development of emotion-aware intelligent interfaces.

Downloads

Download data is not yet available.

Article Details

How to Cite
[1]
Dr. Devraj, Dr. Ravindra Nath, Nikita Singh, Vibhushit Katiyar, and Amber Srivastava , Trans., “Speech Emotion Recognition Using MFCC Features and LSTM-Based Deep Learning”, IJSP, vol. 6, no. 1, pp. 1–6, Feb. 2026, doi: 10.54105/ijsp.A1017.06010226.
Section
Articles

How to Cite

[1]
Dr. Devraj, Dr. Ravindra Nath, Nikita Singh, Vibhushit Katiyar, and Amber Srivastava , Trans., “Speech Emotion Recognition Using MFCC Features and LSTM-Based Deep Learning”, IJSP, vol. 6, no. 1, pp. 1–6, Feb. 2026, doi: 10.54105/ijsp.A1017.06010226.
Share |

References

Trigeorgis, G., Ringeval, F., Brueckner, R., Marchi, E., Nicolaou, M. A., Schuller, B., & Zafeiriou, S. Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network. Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). 5200–5204; DOI: https://doi.org/10.1109/ICASSP.2016.7472669(2016).

Tzirakis, P., Zhang, J. & Schuller, B. End-to-end speech emotion recognition using deep neural networks. IEEE Journal of Selected Topics in Signal Processing. 11,8, 1301–1309; DOI: https://doi.org/10.1109/JSTSP.2017.2764438 (2017).

Latif, S., Qureshi, R., Usman, M., Qadir, J. & Bilal, S. Survey of deep representation learning for speech emotion recognition. IEEE Access. 8, 54709–54725; DOI: https://ieeexplore.ieee.org/document/9099481 (2020).

Fayek, H. M., Lech, M., & Cavedon, L. Evaluating deep learning architectures for speech emotion recognition. Neural Networks. 92, 60–68; https://doi.org/10.1016/j.neunet.2017.02.013 (2017).

Latif, S., Qureshi, R. & Qadir, J. Transfer learning for improving speech emotion recognition from low-resourced languages. Interspeech. DOI: https://arxiv.org/abs/1801.06353(2018).

Neumann, M. & Vu, N. T. Improving speech emotion recognition with unsupervised representation learning on unlabeled speech. IEEE Transactions on Affective Computing. 10, 4, 688–701; DOI: https://doi.org/10.1109/TAFFC.2017.2776664 (2019).

Kim, Y., Lee, H. & Provost, E. M. Speech emotion recognition using multi-hop attention mechanism. ICASSP.

DOI: https://doi.org/10.1109/ICASSP.2017.7952548 (2017).

Parthasarathy, S. & Busso, C. Cross-corpus speech emotion recognition: A review and analysis. Speech Communication. 116, 33–50; DOI: https://doi.org/10.1016/j.specom.2019.10.004 (2020).

Chen, Z., Li, W. & Xu, L. Cross-lingual speech emotion recognition with domain adaptation and self-training. IEEE Access. 9, 14600–14612; DOI: https://doi.org/10.1109/ACCESS.2021.3051529 (2021).

Deng, J., Xu, X. & Schuller, B. Speech emotion recognition using an enhanced convolutional recurrent neural network and data augmentation. ICASSP. DOI: https://doi.org/10.1109/ICASSP.2017.7952057 (2017).

Tripathi, S., Agrawal, S. & Chakraborty, J. Deep neural networks for speech emotion recognition using MFCC features. Procedia Computer Science. 152, 255–262; DOI: https://doi.org/10.1016/j.procs.2019.05.014 (2019).

Aldeneh, Z. & Provost, E. M. Using regional saliency for speech emotion recognition. IEEE Transactions on Affective Computing. 11, 2, 316–328; DOI: https://doi.org/10.1109/TAFFC.2017.2764139 (2020).

Latif, S., Rana, R., Younis, S., Qadir, J. & Epps, J. Multi-task semi-supervised adversarial autoencoding for speech emotion recognition. IEEE Transactions on Affective Computing. 11, 4, 559–571; DOI: https://doi.org/10.1109/TAFFC.2018.2874581 (2020).

Wen, S., Sun, Y., & Liu, G. Speech emotion recognition using transformer networks. IEEE Access. 9, 129293–129304;

DOI: https://doi.org/10.1109/ACCESS.2021.3100932 (2021).

Verma, R. & Tiwari, A. A survey on speech emotion recognition: Features, classification schemes, and datasets. Journal of Information and Optimisation Sciences. 42, 6, 1201–1224; DOI: http://dx.doi.org/10.1080/02522667.2020.1840756(2021)