Visual Prompt Flexible-Modal Face Anti-Spoofing
Recently, vision transformer based multimodal learning methods have been proposed to improve the robustness of face anti-spoofing (FAS) systems. However, multimodal face data collected from the real world is often imperfect due to missing modalities from various imaging sensors. Recently, flexible-m...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Recently, vision transformer based multimodal learning methods have been
proposed to improve the robustness of face anti-spoofing (FAS) systems.
However, multimodal face data collected from the real world is often imperfect
due to missing modalities from various imaging sensors. Recently,
flexible-modal FAS~\cite{yu2023flexible} has attracted more attention, which
aims to develop a unified multimodal FAS model using complete multimodal face
data but is insensitive to test-time missing modalities. In this paper, we
tackle one main challenge in flexible-modal FAS, i.e., when missing modality
occurs either during training or testing in real-world situations. Inspired by
the recent success of the prompt learning in language models, we propose
\textbf{V}isual \textbf{P}rompt flexible-modal \textbf{FAS} (VP-FAS), which
learns the modal-relevant prompts to adapt the frozen pre-trained foundation
model to downstream flexible-modal FAS task. Specifically, both vanilla visual
prompts and residual contextual prompts are plugged into multimodal
transformers to handle general missing-modality cases, while only requiring
less than 4\% learnable parameters compared to training the entire model.
Furthermore, missing-modality regularization is proposed to force models to
learn consistent multimodal feature embeddings when missing partial modalities.
Extensive experiments conducted on two multimodal FAS benchmark datasets
demonstrate the effectiveness of our VP-FAS framework that improves the
performance under various missing-modality cases while alleviating the
requirement of heavy model re-training. |
---|---|
DOI: | 10.48550/arxiv.2307.13958 |