Statistics-aware Audio-visual Deepfake Detector
In this paper, we propose an enhanced audio-visual deep detection method. Recent methods in audio-visual deepfake detection mostly assess the synchronization between audio and visual features. Although they have shown promising results, they are based on the maximization/minimization of isolated fea...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In this paper, we propose an enhanced audio-visual deep detection method.
Recent methods in audio-visual deepfake detection mostly assess the
synchronization between audio and visual features. Although they have shown
promising results, they are based on the maximization/minimization of isolated
feature distances without considering feature statistics. Moreover, they rely
on cumbersome deep learning architectures and are heavily dependent on
empirically fixed hyperparameters. Herein, to overcome these limitations, we
propose: (1) a statistical feature loss to enhance the discrimination
capability of the model, instead of relying solely on feature distances; (2)
using the waveform for describing the audio as a replacement of frequency-based
representations; (3) a post-processing normalization of the fakeness score; (4)
the use of shallower network for reducing the computational complexity.
Experiments on the DFDC and FakeAVCeleb datasets demonstrate the relevance of
the proposed method. |
---|---|
DOI: | 10.48550/arxiv.2407.11650 |