Exposing Fake Faces Through Deep Neural Networks Combining Content and Trace Feature Extractors
With the breakthrough of computer vision and deep learning, there has been a surge of realistic-looking fake face media manipulated by AI such as DeepFake or Face2Face that manipulate facial identities or expressions. The fake faces were mostly created for fun, but abuse has caused social unrest. Fo...
Gespeichert in:
Veröffentlicht in: | IEEE access 2021, Vol.9, p.123493-123503 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | With the breakthrough of computer vision and deep learning, there has been a surge of realistic-looking fake face media manipulated by AI such as DeepFake or Face2Face that manipulate facial identities or expressions. The fake faces were mostly created for fun, but abuse has caused social unrest. For example, some celebrities have become victims of fake pornography made by DeepFake. There are also growing concerns about fake political speech videos created by Face2Face. To maintain individual privacy as well as social, political, and international security, it is imperative to develop models that detect fake faces in media. Previous research can be divided into general-purpose image forensics and face image forensics. While the former has been studied for several decades and focuses on extracting hand-crafted features of traces left in the image after manipulation, the latter is based on convolutional neural networks mainly inspired by object detection models specialized to extract images' content features. This paper proposes a hybrid face forensics framework based on a convolutional neural network combining the two forensics approaches to enhance the manipulation detection performance. To validate the proposed framework, we used a public Face2Face dataset and a custom DeepFake dataset collected on our own. Experimental results using the two datasets showed that the proposed model is more accurate and robust at various video compression rates compared to the previous methods. Throughout class activation map visualization, the proposed framework provided information on which face parts are considered important and revealed the tempering traces invisible to naked eyes. |
---|---|
ISSN: | 2169-3536 2169-3536 |
DOI: | 10.1109/ACCESS.2021.3110859 |