Multi-Attribute Feature Extraction and Selection for Emotion Recognition from Speech through Machine Learning

Speech-based emotion recognition is still challenging due to its complexity despite being widely used in applications relating to emotions. In this paper, we developed a framework by considering three features: Prosodic features, Wavelet, and Spectral features. Under Prosodic, pitch and energy are c...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Traitement du signal 2023-02, Vol.40 (1), p.265-275
Hauptverfasser: Ramyasree, Kummari, Kumar, Chennupati Sumanth
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Speech-based emotion recognition is still challenging due to its complexity despite being widely used in applications relating to emotions. In this paper, we developed a framework by considering three features: Prosodic features, Wavelet, and Spectral features. Under Prosodic, pitch and energy are considered, while under wavelet features, the approximation and detailed sub-bands ate fur scales are considered. Mel-Frequency Cepstral Coefficients (MFCC), Formants, and Long-Term Average Spectrum (LTAS) are all measured from speech signals as part of spectral features. Further, the significant features are selected based on nonlinear statistics, and dimensionality reduction is accomplished through Fisher Criterion. Spearman Rank Correlation is employed to find the nonlinear statistics under correlation analysis. For categorization, a Support Vector Machine and Decision Tree are used. The proposed method is simulated over RAVDESS, SAVEE, EMOVO, and URDU databases, and the observed recognition rates are approximately 79.66%, 88.99%, 87.68%, and 95.78%, respectively.
ISSN:0765-0019
1958-5608
DOI:10.18280/ts.400126