Speech Emotion Recognition Using Machine Learning: A Comparative Analysis
It is possible to identify emotions based on a person's speech. The field of research focusing on expressing emotions through voice is continuously evolving. This study utilizes the SAVEE and IEMOCAP datasets to explore Speech Emotion Recognition. The SAVEE dataset consists of seven emotions, w...
Gespeichert in:
Veröffentlicht in: | SN computer science 2024-04, Vol.5 (4), p.390, Article 390 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | It is possible to identify emotions based on a person's speech. The field of research focusing on expressing emotions through voice is continuously evolving. This study utilizes the SAVEE and IEMOCAP datasets to explore Speech Emotion Recognition. The SAVEE dataset consists of seven emotions, while 4 out of 11 emotions are considered from the IEMOCAP dataset. The features ZCR, MFCC, F0, and RMS are extracted from the raw audio files, and their means are calculated which are fed as input for training the models. The study presents a comparative analysis of emotion detection on both datasets, employing the models RNN, LSTM, Bi-LSTM, RF, Rotation Forest, and Fuzzy. The RF and Bi-LSTM models achieve highest accuracies of 76 and 72%, respectively, on the SAVEE dataset, when compared to other trained models. The fuzzy and Rotation Forest models are implemented which can be improvised with further optimization techniques. Additionally, a diagnostic User Interface is developed for analyzing audio, loading datasets, extracting features, training models, and classifying human emotions from audio using the trained models. |
---|---|
ISSN: | 2661-8907 2662-995X 2661-8907 |
DOI: | 10.1007/s42979-024-02656-0 |