Multi-Attribute Feature Extraction and Selection for Emotion Recognition from Speech through Machine Learning

Speech-based emotion recognition is still challenging due to its complexity despite being widely used in applications relating to emotions. In this paper, we developed a framework by considering three features: Prosodic features, Wavelet, and Spectral features. Under Prosodic, pitch and energy are c...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Traitement du signal 2023-02, Vol.40 (1), p.265-275
Hauptverfasser:	Ramyasree, Kummari, Kumar, Chennupati Sumanth
Format:	Artikel
Sprache:	eng
Schlagworte:	Acoustics Artificial intelligence Classification Correlation analysis Decision analysis Decision making Decision trees Emotion recognition Emotions Feature extraction Feature selection Human-computer interaction Linguistics Machine learning Methods Neural networks Respiration Speech Speech recognition Support vector machines Wavelet transforms
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Speech-based emotion recognition is still challenging due to its complexity despite being widely used in applications relating to emotions. In this paper, we developed a framework by considering three features: Prosodic features, Wavelet, and Spectral features. Under Prosodic, pitch and energy are considered, while under wavelet features, the approximation and detailed sub-bands ate fur scales are considered. Mel-Frequency Cepstral Coefficients (MFCC), Formants, and Long-Term Average Spectrum (LTAS) are all measured from speech signals as part of spectral features. Further, the significant features are selected based on nonlinear statistics, and dimensionality reduction is accomplished through Fisher Criterion. Spearman Rank Correlation is employed to find the nonlinear statistics under correlation analysis. For categorization, a Support Vector Machine and Decision Tree are used. The proposed method is simulated over RAVDESS, SAVEE, EMOVO, and URDU databases, and the observed recognition rates are approximately 79.66%, 88.99%, 87.68%, and 95.78%, respectively.
ISSN:	0765-0019 1958-5608
DOI:	10.18280/ts.400126