Recognizing facial expressions based on pyramid multi-head grid and spatial attention network

Facial Expression Recognition (FER) is garnered considerable interest in the field of computer vision. Being a challenging task, it faces some key problems such as inter-class similarity, intra-class variability, and environment sensitivity. Typically, the traditional Convolutional Neural Networks (...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Computer vision and image understanding 2024-07, Vol.244, p.104010, Article 104010
Hauptverfasser:	Zhang, Jianyang, Wang, Wei, Li, Xiangyu, Han, Yanjiang
Format:	Artikel
Sprache:	eng
Schlagworte:	Attention mechanism Convolutional neural network Deep learning Facial expression recognition Fine-Grained Image Classification
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Facial Expression Recognition (FER) is garnered considerable interest in the field of computer vision. Being a challenging task, it faces some key problems such as inter-class similarity, intra-class variability, and environment sensitivity. Typically, the traditional Convolutional Neural Networks (CNN) are limited by their locality and thus have difficulty learning long-range dependencies between elements in the image, which leads to decreased performance. A innovative expression analysis system that relies on a pyramid multi-head grid and spatial attention network (PMAN) is presented to address these issues. The PMAN is divided into two stages: the initial feature extraction stage, in which the correlations between various facial zones are learned using Multi-head Grid Attention (MGA), and the deep feature learning stage, in which Multi-head Spatial Attention (MSA) is employed in order to improve the model’s global attention to facial features. In addition, a unique feature pyramid design is implemented at the deep feature learning stage to diminish the network’s sensitivity to face image size. The experiments show that the PMAN performs significantly not only better than the existing methods in terms of CK+, RAF-DB, FER+, and AffectNet but also achieves 100% accuracy on the CK+ dataset without using pre-trained models. •We alleviated the inter-class similarity and intra-class variability issue on FER.•We proposed the multi-head grid attention to guide CNN learning lower features.•We designed spatial attention into the pyramid to obtain more subtle features.•We proposed the distraction loss to enhance the ability of each attention head.•The experiment results on 4 datasets outperforms existing state-of-the-art models.
ISSN:	1077-3142 1090-235X
DOI:	10.1016/j.cviu.2024.104010