Speech emotion recognition using feature fusion: a hybrid approach to deep learning

Speech emotion recognition holds significant importance as it enables machines to understand and respond to human emotions, enhancing human-computer interaction and personalized experiences. Accurate identification and interpretation of emotional states from speech signals enable various benefits, i...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Multimedia tools and applications 2024-02, Vol.83 (31), p.75557-75584
Hauptverfasser: Khan, Waleed Akram, ul Qudous, Hamad, Farhan, Asma Ahmad
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Speech emotion recognition holds significant importance as it enables machines to understand and respond to human emotions, enhancing human-computer interaction and personalized experiences. Accurate identification and interpretation of emotional states from speech signals enable various benefits, including enhanced personalized experiences, effective monitoring of mental health, and improved human-computer interfaces. However, recognizing emotions from speech is a difficult task primarily because there exists a significant disparity between acoustic features and human emotions. Both vocal cues and spoken words play significant roles in determining a person’s emotional state. Therefore, in order to accurately identify human emotions from speech, it is essential to extract distinct and meaningful acoustic features. In this paper, we propose a novel approach to infer human emotional states. Human emotional state recognition has a wide range of applications ranging from customer service to mental health. Our proposed approach extracts a set of features from the speech signals, and employs a framework known as deep stride convolutional neural network using bi-directional LSTM. Our proposed model achieved a high accuracy of 95 % which is almost 20 % higher when compared to the state of the art on the RAVDESS dataset, while also minimizing the loss.
ISSN:1573-7721
1380-7501
1573-7721
DOI:10.1007/s11042-024-18316-7