Face Anti-Spoofing via Adversarial Cross-Modality Translation

Face Presentation Attack Detection (PAD) approaches based on multi-modal data have been attracted increasingly by the research community. However, they require multi-modal face data consistently involved in both the training and testing phases. It would severely limit the applicability due to the mo...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on information forensics and security 2021, Vol.16, p.2759-2772
Hauptverfasser: Liu, Ajian, Tan, Zichang, Wan, Jun, Liang, Yanyan, Lei, Zhen, Guo, Guodong, Li, Stan Z.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Face Presentation Attack Detection (PAD) approaches based on multi-modal data have been attracted increasingly by the research community. However, they require multi-modal face data consistently involved in both the training and testing phases. It would severely limit the applicability due to the most Face Anti-spoofing (FAS) systems are only equipped with Visible (VIS) imaging devices, i.e. , RGB cameras. Therefore, how to use other modality (i.e., Near-Infrared (NIR)) to assist the performance improvement of VIS-based PAD is significant for FAS. In this work, we first discuss the big gap of performances among different modalities even though the same backbone network is applied. Then, we propose a novel Cross-modal Auxiliary (CMA) framework for the VIS-based FAS task. The main trait of CMA is that the performance can be greatly improved with the help of other modality while no other modality is required in the testing stage. The proposed CMA consists of a Modality Translation Network (MT-Net) and a Modality Assistance Network (MA-Net). The former aims to close the visible gap between different modalities via a generative model that maps inputs from one modality ( i.e. , RGB) to another ( i.e. , NIR). The latter focuses on how to use the translated modality ( i.e. , target modality) and RGB modality ( i.e. , source modality) together to train a discriminative PAD model. Extensive experiments are conducted to demonstrate that the proposed framework can push the state-of-the-art (SOTA) performances on both multi-modal datasets ( i.e. , CASIA-SURF, CeFA, and WMCA) and RGB-based datasets ( i.e. , OULU-NPU, and SiW).
ISSN:1556-6013
1556-6021
DOI:10.1109/TIFS.2021.3065495