Unsupervised Learning-Based Framework for Deepfake Video Detection

With the continuous development of computer hardware equipment and deep learning technology, it is easier for people to swap faces in videos by currently-emerging multimedia tampering tools, such as the most popular deepfake. It would bring a series of new threats of security. Although many forensic...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on multimedia 2023, Vol.25, p.4785-4799
Hauptverfasser: Zhang, Li, Qiao, Tong, Xu, Ming, Zheng, Ning, Xie, Shichuang
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 4799
container_issue
container_start_page 4785
container_title IEEE transactions on multimedia
container_volume 25
creator Zhang, Li
Qiao, Tong
Xu, Ming
Zheng, Ning
Xie, Shichuang
description With the continuous development of computer hardware equipment and deep learning technology, it is easier for people to swap faces in videos by currently-emerging multimedia tampering tools, such as the most popular deepfake. It would bring a series of new threats of security. Although many forensic researches have focused on this new type of manipulation and achieved high detection accuracy, most of which are based on supervised learning mechanism with requiring a large number of labeled samples for training. In this paper, we first develop a novel unsupervised detection manner for identifying deepfake videos. The main fundamental behind our proposed method is that the face region in the real video is taken by the camera while its counterpart in the deepfake video is usually generated by the computer; the provenance of two videos is totally different. Specifically, our method includes two clustering stages based on Photo-Response Non-Uniformity (PRNU) and noiseprint feature. Firstly, the PRNU fingerprint of each video frame is extracted, which is used to cluster the full-size identical source video (regardless of its real or fake). Secondly, we extract the noiseprint from the face region of the video, which is used to identify (re-cluster for the task of binary classification) the deepfake sample in each cluster. Numerical experiments verify our proposed unsupervised method performs very well on our own dataset and the benchmark FF++ dataset. More importantly, its performance rivals that of the supervised-based state-of-the-art detectors.
doi_str_mv 10.1109/TMM.2022.3182509
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_ieee_primary_9795231</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9795231</ieee_id><sourcerecordid>2884893605</sourcerecordid><originalsourceid>FETCH-LOGICAL-c291t-54cc154bb2c6590c444f4cb5a96fbd302a39d61cb3600d949d7b205e6a5bebee3</originalsourceid><addsrcrecordid>eNo9kE1LAzEQhoMoWKt3wcuC562Tr93N0apVocVL6zUk2VnZ1m7WZKv4701p8TTzwvPOwEPINYUJpaDulovFhAFjE04rJkGdkBFVguYAZXmadskgV4zCObmIcQ1AhYRyRKarLu56DN9txDqbowld233kU7OPs2C2-OPDJmt8yB4R-8ZsMHtva_QpDuiG1neX5KwxnxGvjnNMVrOn5cNLPn97fn24n-eOKTrkUjhHpbCWuUIqcEKIRjgrjSoaW3Nghqu6oM7yAqBWQtWlZSCxMNKiReRjcnu42wf_tcM46LXfhS691KyqRKVSUSYKDpQLPsaAje5DuzXhV1PQe1M6mdJ7U_poKlVuDpUWEf9xVSrJOOV_yVRkFQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2884893605</pqid></control><display><type>article</type><title>Unsupervised Learning-Based Framework for Deepfake Video Detection</title><source>IEEE Electronic Library (IEL)</source><creator>Zhang, Li ; Qiao, Tong ; Xu, Ming ; Zheng, Ning ; Xie, Shichuang</creator><creatorcontrib>Zhang, Li ; Qiao, Tong ; Xu, Ming ; Zheng, Ning ; Xie, Shichuang</creatorcontrib><description>With the continuous development of computer hardware equipment and deep learning technology, it is easier for people to swap faces in videos by currently-emerging multimedia tampering tools, such as the most popular deepfake. It would bring a series of new threats of security. Although many forensic researches have focused on this new type of manipulation and achieved high detection accuracy, most of which are based on supervised learning mechanism with requiring a large number of labeled samples for training. In this paper, we first develop a novel unsupervised detection manner for identifying deepfake videos. The main fundamental behind our proposed method is that the face region in the real video is taken by the camera while its counterpart in the deepfake video is usually generated by the computer; the provenance of two videos is totally different. Specifically, our method includes two clustering stages based on Photo-Response Non-Uniformity (PRNU) and noiseprint feature. Firstly, the PRNU fingerprint of each video frame is extracted, which is used to cluster the full-size identical source video (regardless of its real or fake). Secondly, we extract the noiseprint from the face region of the video, which is used to identify (re-cluster for the task of binary classification) the deepfake sample in each cluster. Numerical experiments verify our proposed unsupervised method performs very well on our own dataset and the benchmark FF++ dataset. More importantly, its performance rivals that of the supervised-based state-of-the-art detectors.</description><identifier>ISSN: 1520-9210</identifier><identifier>EISSN: 1941-0077</identifier><identifier>DOI: 10.1109/TMM.2022.3182509</identifier><identifier>CODEN: ITMUF8</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Cameras ; Clustering ; Cyberspace ; Datasets ; Deception ; Deep learning ; Deepfake detection ; Faces ; Feature extraction ; Forensics ; Multimedia ; noiseprint ; Nonuniformity ; PRNU ; Streaming media ; Supervised learning ; Training ; Unsupervised learning ; Video ; video clustering</subject><ispartof>IEEE transactions on multimedia, 2023, Vol.25, p.4785-4799</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c291t-54cc154bb2c6590c444f4cb5a96fbd302a39d61cb3600d949d7b205e6a5bebee3</citedby><cites>FETCH-LOGICAL-c291t-54cc154bb2c6590c444f4cb5a96fbd302a39d61cb3600d949d7b205e6a5bebee3</cites><orcidid>0000-0003-3503-8167 ; 0000-0003-4912-2132 ; 0000-0001-9332-5258 ; 0000-0002-3142-3968</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9795231$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,4024,27923,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9795231$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Zhang, Li</creatorcontrib><creatorcontrib>Qiao, Tong</creatorcontrib><creatorcontrib>Xu, Ming</creatorcontrib><creatorcontrib>Zheng, Ning</creatorcontrib><creatorcontrib>Xie, Shichuang</creatorcontrib><title>Unsupervised Learning-Based Framework for Deepfake Video Detection</title><title>IEEE transactions on multimedia</title><addtitle>TMM</addtitle><description>With the continuous development of computer hardware equipment and deep learning technology, it is easier for people to swap faces in videos by currently-emerging multimedia tampering tools, such as the most popular deepfake. It would bring a series of new threats of security. Although many forensic researches have focused on this new type of manipulation and achieved high detection accuracy, most of which are based on supervised learning mechanism with requiring a large number of labeled samples for training. In this paper, we first develop a novel unsupervised detection manner for identifying deepfake videos. The main fundamental behind our proposed method is that the face region in the real video is taken by the camera while its counterpart in the deepfake video is usually generated by the computer; the provenance of two videos is totally different. Specifically, our method includes two clustering stages based on Photo-Response Non-Uniformity (PRNU) and noiseprint feature. Firstly, the PRNU fingerprint of each video frame is extracted, which is used to cluster the full-size identical source video (regardless of its real or fake). Secondly, we extract the noiseprint from the face region of the video, which is used to identify (re-cluster for the task of binary classification) the deepfake sample in each cluster. Numerical experiments verify our proposed unsupervised method performs very well on our own dataset and the benchmark FF++ dataset. More importantly, its performance rivals that of the supervised-based state-of-the-art detectors.</description><subject>Cameras</subject><subject>Clustering</subject><subject>Cyberspace</subject><subject>Datasets</subject><subject>Deception</subject><subject>Deep learning</subject><subject>Deepfake detection</subject><subject>Faces</subject><subject>Feature extraction</subject><subject>Forensics</subject><subject>Multimedia</subject><subject>noiseprint</subject><subject>Nonuniformity</subject><subject>PRNU</subject><subject>Streaming media</subject><subject>Supervised learning</subject><subject>Training</subject><subject>Unsupervised learning</subject><subject>Video</subject><subject>video clustering</subject><issn>1520-9210</issn><issn>1941-0077</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kE1LAzEQhoMoWKt3wcuC562Tr93N0apVocVL6zUk2VnZ1m7WZKv4701p8TTzwvPOwEPINYUJpaDulovFhAFjE04rJkGdkBFVguYAZXmadskgV4zCObmIcQ1AhYRyRKarLu56DN9txDqbowld233kU7OPs2C2-OPDJmt8yB4R-8ZsMHtva_QpDuiG1neX5KwxnxGvjnNMVrOn5cNLPn97fn24n-eOKTrkUjhHpbCWuUIqcEKIRjgrjSoaW3Nghqu6oM7yAqBWQtWlZSCxMNKiReRjcnu42wf_tcM46LXfhS691KyqRKVSUSYKDpQLPsaAje5DuzXhV1PQe1M6mdJ7U_poKlVuDpUWEf9xVSrJOOV_yVRkFQ</recordid><startdate>2023</startdate><enddate>2023</enddate><creator>Zhang, Li</creator><creator>Qiao, Tong</creator><creator>Xu, Ming</creator><creator>Zheng, Ning</creator><creator>Xie, Shichuang</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0003-3503-8167</orcidid><orcidid>https://orcid.org/0000-0003-4912-2132</orcidid><orcidid>https://orcid.org/0000-0001-9332-5258</orcidid><orcidid>https://orcid.org/0000-0002-3142-3968</orcidid></search><sort><creationdate>2023</creationdate><title>Unsupervised Learning-Based Framework for Deepfake Video Detection</title><author>Zhang, Li ; Qiao, Tong ; Xu, Ming ; Zheng, Ning ; Xie, Shichuang</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c291t-54cc154bb2c6590c444f4cb5a96fbd302a39d61cb3600d949d7b205e6a5bebee3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Cameras</topic><topic>Clustering</topic><topic>Cyberspace</topic><topic>Datasets</topic><topic>Deception</topic><topic>Deep learning</topic><topic>Deepfake detection</topic><topic>Faces</topic><topic>Feature extraction</topic><topic>Forensics</topic><topic>Multimedia</topic><topic>noiseprint</topic><topic>Nonuniformity</topic><topic>PRNU</topic><topic>Streaming media</topic><topic>Supervised learning</topic><topic>Training</topic><topic>Unsupervised learning</topic><topic>Video</topic><topic>video clustering</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zhang, Li</creatorcontrib><creatorcontrib>Qiao, Tong</creatorcontrib><creatorcontrib>Xu, Ming</creatorcontrib><creatorcontrib>Zheng, Ning</creatorcontrib><creatorcontrib>Xie, Shichuang</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on multimedia</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Zhang, Li</au><au>Qiao, Tong</au><au>Xu, Ming</au><au>Zheng, Ning</au><au>Xie, Shichuang</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Unsupervised Learning-Based Framework for Deepfake Video Detection</atitle><jtitle>IEEE transactions on multimedia</jtitle><stitle>TMM</stitle><date>2023</date><risdate>2023</risdate><volume>25</volume><spage>4785</spage><epage>4799</epage><pages>4785-4799</pages><issn>1520-9210</issn><eissn>1941-0077</eissn><coden>ITMUF8</coden><abstract>With the continuous development of computer hardware equipment and deep learning technology, it is easier for people to swap faces in videos by currently-emerging multimedia tampering tools, such as the most popular deepfake. It would bring a series of new threats of security. Although many forensic researches have focused on this new type of manipulation and achieved high detection accuracy, most of which are based on supervised learning mechanism with requiring a large number of labeled samples for training. In this paper, we first develop a novel unsupervised detection manner for identifying deepfake videos. The main fundamental behind our proposed method is that the face region in the real video is taken by the camera while its counterpart in the deepfake video is usually generated by the computer; the provenance of two videos is totally different. Specifically, our method includes two clustering stages based on Photo-Response Non-Uniformity (PRNU) and noiseprint feature. Firstly, the PRNU fingerprint of each video frame is extracted, which is used to cluster the full-size identical source video (regardless of its real or fake). Secondly, we extract the noiseprint from the face region of the video, which is used to identify (re-cluster for the task of binary classification) the deepfake sample in each cluster. Numerical experiments verify our proposed unsupervised method performs very well on our own dataset and the benchmark FF++ dataset. More importantly, its performance rivals that of the supervised-based state-of-the-art detectors.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/TMM.2022.3182509</doi><tpages>15</tpages><orcidid>https://orcid.org/0000-0003-3503-8167</orcidid><orcidid>https://orcid.org/0000-0003-4912-2132</orcidid><orcidid>https://orcid.org/0000-0001-9332-5258</orcidid><orcidid>https://orcid.org/0000-0002-3142-3968</orcidid></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1520-9210
ispartof IEEE transactions on multimedia, 2023, Vol.25, p.4785-4799
issn 1520-9210
1941-0077
language eng
recordid cdi_ieee_primary_9795231
source IEEE Electronic Library (IEL)
subjects Cameras
Clustering
Cyberspace
Datasets
Deception
Deep learning
Deepfake detection
Faces
Feature extraction
Forensics
Multimedia
noiseprint
Nonuniformity
PRNU
Streaming media
Supervised learning
Training
Unsupervised learning
Video
video clustering
title Unsupervised Learning-Based Framework for Deepfake Video Detection
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-19T17%3A12%3A43IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Unsupervised%20Learning-Based%20Framework%20for%20Deepfake%20Video%20Detection&rft.jtitle=IEEE%20transactions%20on%20multimedia&rft.au=Zhang,%20Li&rft.date=2023&rft.volume=25&rft.spage=4785&rft.epage=4799&rft.pages=4785-4799&rft.issn=1520-9210&rft.eissn=1941-0077&rft.coden=ITMUF8&rft_id=info:doi/10.1109/TMM.2022.3182509&rft_dat=%3Cproquest_RIE%3E2884893605%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2884893605&rft_id=info:pmid/&rft_ieee_id=9795231&rfr_iscdi=true