Multi-Modal Emotion Recognition by Fusing Correlation Features of Speech-Visual
To effectively fuse speech and visual features, this letter proposes a multi-modal emotion recognition method by fusing correlation features of speech-visual. Firstly, speech and visual features are extracted by two-dimensional convolutional neural network (2D-CNN) and three-dimensional convolutiona...
Gespeichert in:
Veröffentlicht in: | IEEE signal processing letters 2021, Vol.28, p.533-537 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | To effectively fuse speech and visual features, this letter proposes a multi-modal emotion recognition method by fusing correlation features of speech-visual. Firstly, speech and visual features are extracted by two-dimensional convolutional neural network (2D-CNN) and three-dimensional convolutional neural network (3D-CNN), respectively. Secondly, the speech and visual features is processed by feature correlation analysis algorithm in multi-modal fusion. In addition, the class information of speech and visual features are also applied to the feature correlation analysis algorithm, which can effectively fuse speech and visual features and improve the performance of multi-modal emotion recognition. Finally, support vector machines (SVM) completes the classification of multi-modal speech and visual emotion recognition. Experimental results on the RML, eNTERFACE05, BAUM-1 s datasets show that the recognition rate of our method is higher than other state-of-the-art methods. |
---|---|
ISSN: | 1070-9908 1558-2361 |
DOI: | 10.1109/LSP.2021.3055755 |