Skeleton-Based Emotion Recognition Based on Two-Stream Self-Attention Enhanced Spatial-Temporal Graph Convolutional Network

Emotion recognition has drawn consistent attention from researchers recently. Although gesture modality plays an important role in expressing emotion, it is seldom considered in the field of emotion recognition. A key reason is the scarcity of labeled data containing 3D skeleton data. Some studies i...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Sensors (Basel, Switzerland) Switzerland), 2020-12, Vol.21 (1), p.205
Hauptverfasser:	Shi, Jiaqi, Liu, Chaoran, Ishi, Carlos Toshinori, Ishiguro, Hiroshi
Format:	Artikel
Sprache:	eng
Schlagworte:	emotion recognition gesture graph convolutional networks self-attention skeleton
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Emotion recognition has drawn consistent attention from researchers recently. Although gesture modality plays an important role in expressing emotion, it is seldom considered in the field of emotion recognition. A key reason is the scarcity of labeled data containing 3D skeleton data. Some studies in action recognition have applied graph-based neural networks to explicitly model the spatial connection between joints. However, this method has not been considered in the field of gesture-based emotion recognition, so far. In this work, we applied a pose estimation based method to extract 3D skeleton coordinates for IEMOCAP database. We propose a self-attention enhanced spatial temporal graph convolutional network for skeleton-based emotion recognition, in which the spatial convolutional part models the skeletal structure of the body as a static graph, and the self-attention part dynamically constructs more connections between the joints and provides supplementary information. Our experiment demonstrates that the proposed model significantly outperforms other models and that the features of the extracted skeleton data improve the performance of multimodal emotion recognition.
ISSN:	1424-8220 1424-8220
DOI:	10.3390/s21010205