Protecting World Leader Using Facial Speaking Pattern Against Deepfakes

Face forgery instances involving celebrities are on the rise, owing to the ease with which their large quantity of videos may be accessible on the Internet, world leaders particularly. While current face manipulation detectors have achieved impressive results on several open datasets, which incorpor...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE signal processing letters 2022, Vol.29, p.2078-2082
Hauptverfasser: Chu, Beilin, You, Weike, Yang, Zhen, Zhou, Linna, Wang, Renying
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Face forgery instances involving celebrities are on the rise, owing to the ease with which their large quantity of videos may be accessible on the Internet, world leaders particularly. While current face manipulation detectors have achieved impressive results on several open datasets, which incorporate persons with various identities, they show performance degradation on these high-quality ones targeting at celebrities. What is more, these online videos usually undergo compression processing, marking the detection task harder. Besides, more face manipulation techniques arise for celebrities other than face-swap, such as lip-synchronize and image-animation, with which most works have not been concerned. This paper proposes a dual stream learning facial and speaking patterns method to protect celebrities against deepfakes. We design an action unit module based on facial action coding system along with an Action Unit Transformer (AUT) to exploit facial expressions embeddings. Besides, our method's dual stream architecture utilizes a Temporal Convolutional Network (TCN) to extract lip motion pattern and learns the relatedness between facial and speaking patterns. Our method could protect the person of interest (POI) against deepfakes in an end-to-end manner. Extensive experiments show that our method achieves better performance and has a higher resistance to video compression than state-of-the-art detection models.
ISSN:1070-9908
1558-2361
DOI:10.1109/LSP.2022.3205562