Learning facial expression and body gesture visual information for video emotion recognition

Recent research has shown that facial expressions and body gestures are two significant implications in identifying human emotions. However, these studies mainly focus on contextual information of adjacent frames, and rarely explore the spatio-temporal relationships between distant or global frames....

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Expert systems with applications 2024-03, Vol.237, p.121419, Article 121419
Hauptverfasser:	Wei, Jie, Hu, Guanyu, Yang, Xinyu, Luu, Anh Tuan, Dong, Yizhuo
Format:	Artikel
Sprache:	eng
Schlagworte:	Body joints Facial expression Gesture representation Spatio-temporal features Video emotion recognition
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Recent research has shown that facial expressions and body gestures are two significant implications in identifying human emotions. However, these studies mainly focus on contextual information of adjacent frames, and rarely explore the spatio-temporal relationships between distant or global frames. In this paper, we revisit the facial expression and body gesture emotion recognition problems, and propose to improve the performance of video emotion recognition by extracting the spatio-temporal features via further encoding temporal information. Specifically, for facial expression, we propose a super image-based spatio-temporal convolutional model (SISTCM) and a two-stream LSTM model to capture the local spatio-temporal features and learn global temporal cues of emotion changes. For body gestures, a novel representation method and an attention-based channel-wise convolutional model (ACCM) are introduced to learn key joints features and independent characteristics of each joint. Extensive experiments on five common datasets are carried out to prove the superiority of the proposed method, and the results proved learning two visual information leads to significant improvement over the existing state-of-the-art methods. •A novel spatio-temporal features extraction model is proposed.•The progressive relationship of emotion expressions is considered.•The changes of body gesture are encoded for emotion recognition.•Two visual information of facial expression and body gesture are used.
ISSN:	0957-4174 1873-6793
DOI:	10.1016/j.eswa.2023.121419