A hierarchical depression detection model based on vocal and emotional cues
Effective and efficient automatic depression diagnosis is a challenging subject in the field of affective computing. Since speech signals provide useful information for diagnosing depression, in this paper, we propose to extract deep speaker recognition (SR) and speech emotion recognition (SER) feat...
Gespeichert in:
Veröffentlicht in: | Neurocomputing (Amsterdam) 2021-06, Vol.441, p.279-290 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Effective and efficient automatic depression diagnosis is a challenging subject in the field of affective computing. Since speech signals provide useful information for diagnosing depression, in this paper, we propose to extract deep speaker recognition (SR) and speech emotion recognition (SER) features using pretrained models, and combine the two deep speech features to take advantage of the complementary information between the vocal and emotional differences of speakers. In addition, due to the small amount of data for depression recognition and the cost sensitivity of the diagnosis results, we propose a hierarchical depression detection model, in which multiple classifiers are set up prior to a regressor to guide the prediction of depression severity. We test our method on the AVEC 2013 and AVEC 2014 benchmark databases. The results demonstrate that the fusion of deep SR and SER features can improve the prediction performance of the model. The proposed method, using only audio features, can avoid the overfitting problem and achieves better performance than the previous audio-based methods on both databases. It also provides results comparable to those of video-based and multimodal-based methods for depression detection. |
---|---|
ISSN: | 0925-2312 1872-8286 |
DOI: | 10.1016/j.neucom.2021.02.019 |