Inceptr: micro-expression recognition integrating inception-CBAM and vision transformer
Micro-Expressions (MEs) are the instantaneous and subtle facial movement that conveys crucial emotional information. However, traditional neural networks face difficulties in accurately capturing the delicate features of MEs due to the limited amount of available data. To address this issue, a dual-...
Gespeichert in:
Veröffentlicht in: | Multimedia systems 2023-12, Vol.29 (6), p.3863-3876 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Micro-Expressions (MEs) are the instantaneous and subtle facial movement that conveys crucial emotional information. However, traditional neural networks face difficulties in accurately capturing the delicate features of MEs due to the limited amount of available data. To address this issue, a dual-branch attention network is proposed for ME recognition, called IncepTR, which can capture attention-aware local and global representations. The network takes optical flow features as input and performs feature extraction using a dual-branch network. First, the Inception model based on the Convolutional Block Attention Module (CBAM) attention mechanism is maintained for multi-scale local feature extraction. Second, the Vision Transformer (ViT) is employed to capture subtle motion features and robustly model global relationships among multiple local patches. Additionally, to enhance the rich relationships between different local patches in ViT, Multi-head Self-Attention Dropping (MSAD) is introduced to drop an attention map randomly, effectively preventing overfitting to specific regions. Finally, the two types of features could be used to learn ME representations effectively through similarity comparison and feature fusion. With such combination, the model is forced to capture the most discriminative multi-scale local and global features while reducing the influence of affective-irrelevant features. Extensive experiments show that the proposed IncepTR achieves UF1 and UAR of 0.753 and 0.746 on the composite dataset MEGC2019-CD, demonstrating better or competitive performance compared to existing state-of-the-art methods for ME recognition. |
---|---|
ISSN: | 0942-4962 1432-1882 |
DOI: | 10.1007/s00530-023-01164-0 |