Recognizing facial expressions based on pyramid multi-head grid and spatial attention network
Facial Expression Recognition (FER) is garnered considerable interest in the field of computer vision. Being a challenging task, it faces some key problems such as inter-class similarity, intra-class variability, and environment sensitivity. Typically, the traditional Convolutional Neural Networks (...
Gespeichert in:
Veröffentlicht in: | Computer vision and image understanding 2024-07, Vol.244, p.104010, Article 104010 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Facial Expression Recognition (FER) is garnered considerable interest in the field of computer vision. Being a challenging task, it faces some key problems such as inter-class similarity, intra-class variability, and environment sensitivity. Typically, the traditional Convolutional Neural Networks (CNN) are limited by their locality and thus have difficulty learning long-range dependencies between elements in the image, which leads to decreased performance. A innovative expression analysis system that relies on a pyramid multi-head grid and spatial attention network (PMAN) is presented to address these issues. The PMAN is divided into two stages: the initial feature extraction stage, in which the correlations between various facial zones are learned using Multi-head Grid Attention (MGA), and the deep feature learning stage, in which Multi-head Spatial Attention (MSA) is employed in order to improve the model’s global attention to facial features. In addition, a unique feature pyramid design is implemented at the deep feature learning stage to diminish the network’s sensitivity to face image size. The experiments show that the PMAN performs significantly not only better than the existing methods in terms of CK+, RAF-DB, FER+, and AffectNet but also achieves 100% accuracy on the CK+ dataset without using pre-trained models.
•We alleviated the inter-class similarity and intra-class variability issue on FER.•We proposed the multi-head grid attention to guide CNN learning lower features.•We designed spatial attention into the pyramid to obtain more subtle features.•We proposed the distraction loss to enhance the ability of each attention head.•The experiment results on 4 datasets outperforms existing state-of-the-art models. |
---|---|
ISSN: | 1077-3142 1090-235X |
DOI: | 10.1016/j.cviu.2024.104010 |