Self-labeling with feature transfer for speech emotion recognition

Most speech emotion recognition methods based on frames have obtained good results in many applications. However, they segment each speech sample into smaller frames that are labeled with the same emotional tag as that of the speech sample. This is inconsistent with the possibility of a speech sampl...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Knowledge-based systems 2022-10, Vol.254, p.109589, Article 109589
Hauptverfasser: Wen, Guihua, Liao, Huiqiang, Li, Huihui, Wen, Pengchen, Zhang, Tong, Gao, Sande, Wang, Bao
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Most speech emotion recognition methods based on frames have obtained good results in many applications. However, they segment each speech sample into smaller frames that are labeled with the same emotional tag as that of the speech sample. This is inconsistent with the possibility of a speech sample containing several emotional categories at the same time. Thus, this paper proposes a self-labeling (SL) learning method for speech emotion recognition, which automatically segments each speech sample into frames and then labels them with the corresponding emotional tags, where the compatibility of these tags is also checked. Then, a time-frequency deep neural network for speech emotion recognition is designed and trained. As most speech emotion datasets are very small, the feature transfer model is applied to further enhance the performance of the SL learning method, which is trained on large-scale audio data. Experimental results on various datasets demonstrate the effectiveness of the proposed method.
ISSN:0950-7051
1872-7409
DOI:10.1016/j.knosys.2022.109589