Spatio-Temporal-Based Action Face Anti-Spoofing Detection via Fusing Dynamics and Texture Face Keypoints Cues

In recent years, the use of action face anti-spoofing (FAS) tasks has become widespread in various identity security authentication scenarios. However, the current performance of action FAS is susceptible to the impact of lighting conditions and the scale of the action. Furthermore, due to the simpl...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on consumer electronics 2024-02, Vol.70 (1), p.2401-2413
Hauptverfasser: Liu, Weihua, Pan, Yushan
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In recent years, the use of action face anti-spoofing (FAS) tasks has become widespread in various identity security authentication scenarios. However, the current performance of action FAS is susceptible to the impact of lighting conditions and the scale of the action. Furthermore, due to the simplicity of facial action behavior, it is unable to withstand highly realistic 3D counterfeit face attacks. To address these issues, this paper proposes a spatio-temporal-based action FAS framework that combines dynamics and texture information around local keypoint areas of the face. To tackle the issue of action scale, we integrate a clip of face action video into a single face keypoint-based feature map, incorporating motion trajectories and motion history cues. Specifically, we develop two essential features in the spatiotemporal domain: Keypoints Trajectory Invariant Feature (KTIF) and Keypoints Motion History Feature (KMHF). These features are constructed by utilizing the absolute position of keypoints and the relative bias position of keypoints between frames. Additionally, to further enhance the security level of the action FAS task, we propose a novel representation called Keypoints Neighbourhood Texture Difference Feature (KNTDF). This representation utilizes an Encoder-Decoder module with center difference convolution (CDC) to effectively address the issue of 2D/3D spoofing at the image noise level. Finally, by embedding the Swin-transformer architecture with a channel attention module and skip fusion strategy, the above three deep representations are fused and further classified for both the face action classification (FAC) task and the face anti-spoofing task. The experimental results show that the proposed method performs well on both tasks. Especially for action FAS, compared to the majority of existing work, it demonstrates significant capabilities in overcoming lighting conditions and motion scales.
ISSN:0098-3063
1558-4127
DOI:10.1109/TCE.2024.3361480