Spatio-Temporal-Based Action Face Anti-Spoofing Detection via Fusing Dynamics and Texture Face Keypoints Cues
In recent years, the use of action face anti-spoofing (FAS) tasks has become widespread in various identity security authentication scenarios. However, the current performance of action FAS is susceptible to the impact of lighting conditions and the scale of the action. Furthermore, due to the simpl...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on consumer electronics 2024-02, Vol.70 (1), p.2401-2413 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In recent years, the use of action face anti-spoofing (FAS) tasks has become widespread in various identity security authentication scenarios. However, the current performance of action FAS is susceptible to the impact of lighting conditions and the scale of the action. Furthermore, due to the simplicity of facial action behavior, it is unable to withstand highly realistic 3D counterfeit face attacks. To address these issues, this paper proposes a spatio-temporal-based action FAS framework that combines dynamics and texture information around local keypoint areas of the face. To tackle the issue of action scale, we integrate a clip of face action video into a single face keypoint-based feature map, incorporating motion trajectories and motion history cues. Specifically, we develop two essential features in the spatiotemporal domain: Keypoints Trajectory Invariant Feature (KTIF) and Keypoints Motion History Feature (KMHF). These features are constructed by utilizing the absolute position of keypoints and the relative bias position of keypoints between frames. Additionally, to further enhance the security level of the action FAS task, we propose a novel representation called Keypoints Neighbourhood Texture Difference Feature (KNTDF). This representation utilizes an Encoder-Decoder module with center difference convolution (CDC) to effectively address the issue of 2D/3D spoofing at the image noise level. Finally, by embedding the Swin-transformer architecture with a channel attention module and skip fusion strategy, the above three deep representations are fused and further classified for both the face action classification (FAC) task and the face anti-spoofing task. The experimental results show that the proposed method performs well on both tasks. Especially for action FAS, compared to the majority of existing work, it demonstrates significant capabilities in overcoming lighting conditions and motion scales. |
---|---|
ISSN: | 0098-3063 1558-4127 |
DOI: | 10.1109/TCE.2024.3361480 |