Quality-Aware Bag of Modulation Spectrum Features for Robust Speech Emotion Recognition
Automatic speech emotion recognition (SER) has gained popularity over the last decade and numerous Challenges have emerged. While the latest Challenges have shown that deep neural networks achieve the best results, existing input features are still a bottleneck and cause severe performance degradati...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on affective computing 2022-10, Vol.13 (4), p.1892-1905 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Automatic speech emotion recognition (SER) has gained popularity over the last decade and numerous Challenges have emerged. While the latest Challenges have shown that deep neural networks achieve the best results, existing input features are still a bottleneck and cause severe performance degradation in realistic "in-the-wild" scenarios. In this paper, we propose two innovations to tackle this issue. First, we propose to combine the bag-of-audio-words methodology with modulation spectrum features for environmental robustness. Second, we take advantage of the inherent quality-awareness properties of modulation spectrum and propose the use of a quality feature as an additional feature to be used by the speech emotion recognizer. Experiments are conducted with three multi-lingual speech datasets used in recent SER Challenges degraded by different noise sources and levels, and room reverberation. Experimental results show the proposed features i) consistently outperforming benchmark systems, ii) providing complementary information to classical features, hence improving performance with feature fusion, and iii) showing robustness against environment and language mismatch. Moreover, we show that when the proposed system is provided with quality information, further improvements are obtained. Overall, the proposed bag of modulation spectrum features are shown to be a promising candidate for "in-the-wild" SER. |
---|---|
ISSN: | 1949-3045 1949-3045 |
DOI: | 10.1109/TAFFC.2022.3188223 |