Deep learning and SVM‐based emotion recognition from Chinese speech for smart affective services

Summary Emotion recognition is challenging for understanding people and enhances human–computer interaction experiences, which contributes to the harmonious running of smart health care and other smart services. In this paper, several kinds of speech features such as Mel frequency cepstrum coefficie...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Software, practice & experience practice & experience, 2017-08, Vol.47 (8), p.1127-1138
Hauptverfasser: Zhang, Weishan, Zhao, Dehai, Chai, Zhi, Yang, Laurence T., Liu, Xin, Gong, Faming, Yang, Su
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Summary Emotion recognition is challenging for understanding people and enhances human–computer interaction experiences, which contributes to the harmonious running of smart health care and other smart services. In this paper, several kinds of speech features such as Mel frequency cepstrum coefficient, pitch, and formant were extracted and combined in different ways to reflect the relationship between feature fusions and emotion recognition performance. In addition, we explored two methods, namely, support vector machine (SVM) and deep belief networks (DBNs), to classify six emotion status: anger, fear, joy, neutral status, sadness, and surprise. In the SVM‐based method, we used SVM multi‐classification algorithm to optimize the parameters of penalty factor and kernel function. With DBN, we adjusted different parameters to achieve the best performance when solving different emotions. Both gender‐dependent and gender‐independent experiments were conducted on the Chinese Academy of Sciences emotional speech database. The mean accuracy of SVM is 84.54%, and the mean accuracy of DBN is 94.6%. The experiments show that the DBN‐based approach has good potential for practical usage, and suitable feature fusions will further improve the performance of speech emotion recognition. Copyright © 2017 John Wiley & Sons, Ltd.
ISSN:0038-0644
1097-024X
DOI:10.1002/spe.2487