Emotion recognition using multi-modal features and CNN classification

An emerging use of artificial intelligence is automatic emotion recognition. Facial expression identification is an intriguing and challenging problem in computer vision. In data science, one of the most difficult problems is speech emotion recognition. The technology that has been built consists of...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Khanum, Saba Noor Ayesha, Mummadi, Upendra Kumar, Taranum, Fahmina, Ahmad, Syed Shabbeer, Khan, Imtiyaz, Shravani, D.
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:An emerging use of artificial intelligence is automatic emotion recognition. Facial expression identification is an intriguing and challenging problem in computer vision. In data science, one of the most difficult problems is speech emotion recognition. The technology that has been built consists of two stages: the first involves real-time facial and speech capture and the second is categorizing of emotions. Data collection, data analysis, and data visualization are the stages of automated emotion identification. Convolution neural networks are used in the proposed multimodal system to identify emotions from speech and face expressions. Each block in the sequence is made up of convolution layers and sub sampling layers. The most difficult of all the available datasets, FER2013, was used to train the model for face emotion recognition. The accuracy that has been attained for this task is 71%. To address the issue of data deficiency in speech emotion identification, four distinct datasets—CREMA-D, RAVDESS, SAVEE, and TESS were integrated. The accuracy achieved for this challenge is 88%. The suggested approach can recognize eight emotions in total namely “angry, calm, disgust, fear, happy, neutral, sad, and surprised” for both the speech and the face, respectively. Additional effects include batch normalization, early stopping, and dropouts for better performance.
ISSN:0094-243X
1551-7616
DOI:10.1063/5.0192751