Unsupervised Learning-Based Framework for Deepfake Video Detection

With the continuous development of computer hardware equipment and deep learning technology, it is easier for people to swap faces in videos by currently-emerging multimedia tampering tools, such as the most popular deepfake. It would bring a series of new threats of security. Although many forensic...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on multimedia 2023, Vol.25, p.4785-4799
Hauptverfasser:	Zhang, Li, Qiao, Tong, Xu, Ming, Zheng, Ning, Xie, Shichuang
Format:	Artikel
Sprache:	eng
Schlagworte:	Cameras Clustering Cyberspace Datasets Deception Deep learning Deepfake detection Faces Feature extraction Forensics Multimedia noiseprint Nonuniformity PRNU Streaming media Supervised learning Training Unsupervised learning Video video clustering
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	4799
container_issue
container_start_page	4785
container_title	IEEE transactions on multimedia
container_volume	25
creator	Zhang, Li Qiao, Tong Xu, Ming Zheng, Ning Xie, Shichuang
description	With the continuous development of computer hardware equipment and deep learning technology, it is easier for people to swap faces in videos by currently-emerging multimedia tampering tools, such as the most popular deepfake. It would bring a series of new threats of security. Although many forensic researches have focused on this new type of manipulation and achieved high detection accuracy, most of which are based on supervised learning mechanism with requiring a large number of labeled samples for training. In this paper, we first develop a novel unsupervised detection manner for identifying deepfake videos. The main fundamental behind our proposed method is that the face region in the real video is taken by the camera while its counterpart in the deepfake video is usually generated by the computer; the provenance of two videos is totally different. Specifically, our method includes two clustering stages based on Photo-Response Non-Uniformity (PRNU) and noiseprint feature. Firstly, the PRNU fingerprint of each video frame is extracted, which is used to cluster the full-size identical source video (regardless of its real or fake). Secondly, we extract the noiseprint from the face region of the video, which is used to identify (re-cluster for the task of binary classification) the deepfake sample in each cluster. Numerical experiments verify our proposed unsupervised method performs very well on our own dataset and the benchmark FF++ dataset. More importantly, its performance rivals that of the supervised-based state-of-the-art detectors.
doi_str_mv	10.1109/TMM.2022.3182509
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_ieee_primary_9795231</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9795231</ieee_id><sourcerecordid>2884893605</sourcerecordid><originalsourceid>FETCH-LOGICAL-c291t-54cc154bb2c6590c444f4cb5a96fbd302a39d61cb3600d949d7b205e6a5bebee3</originalsourceid><addsrcrecordid>eNo9kE1LAzEQhoMoWKt3wcuC562Tr93N0apVocVL6zUk2VnZ1m7WZKv4701p8TTzwvPOwEPINYUJpaDulovFhAFjE04rJkGdkBFVguYAZXmadskgV4zCObmIcQ1AhYRyRKarLu56DN9txDqbowld233kU7OPs2C2-OPDJmt8yB4R-8ZsMHtva_QpDuiG1neX5KwxnxGvjnNMVrOn5cNLPn97fn24n-eOKTrkUjhHpbCWuUIqcEKIRjgrjSoaW3Nghqu6oM7yAqBWQtWlZSCxMNKiReRjcnu42wf_tcM46LXfhS691KyqRKVSUSYKDpQLPsaAje5DuzXhV1PQe1M6mdJ7U_poKlVuDpUWEf9xVSrJOOV_yVRkFQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2884893605</pqid></control><display><type>article</type><title>Unsupervised Learning-Based Framework for Deepfake Video Detection</title><source>IEEE Electronic Library (IEL)</source><creator>Zhang, Li ; Qiao, Tong ; Xu, Ming ; Zheng, Ning ; Xie, Shichuang</creator><creatorcontrib>Zhang, Li ; Qiao, Tong ; Xu, Ming ; Zheng, Ning ; Xie, Shichuang</creatorcontrib><description>With the continuous development of computer hardware equipment and deep learning technology, it is easier for people to swap faces in videos by currently-emerging multimedia tampering tools, such as the most popular deepfake. It would bring a series of new threats of security. Although many forensic researches have focused on this new type of manipulation and achieved high detection accuracy, most of which are based on supervised learning mechanism with requiring a large number of labeled samples for training. In this paper, we first develop a novel unsupervised detection manner for identifying deepfake videos. The main fundamental behind our proposed method is that the face region in the real video is taken by the camera while its counterpart in the deepfake video is usually generated by the computer; the provenance of two videos is totally different. Specifically, our method includes two clustering stages based on Photo-Response Non-Uniformity (PRNU) and noiseprint feature. Firstly, the PRNU fingerprint of each video frame is extracted, which is used to cluster the full-size identical source video (regardless of its real or fake). Secondly, we extract the noiseprint from the face region of the video, which is used to identify (re-cluster for the task of binary classification) the deepfake sample in each cluster. Numerical experiments verify our proposed unsupervised method performs very well on our own dataset and the benchmark FF++ dataset. More importantly, its performance rivals that of the supervised-based state-of-the-art detectors.</description><identifier>ISSN: 1520-9210</identifier><identifier>EISSN: 1941-0077</identifier><identifier>DOI: 10.1109/TMM.2022.3182509</identifier><identifier>CODEN: ITMUF8</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Cameras ; Clustering ; Cyberspace ; Datasets ; Deception ; Deep learning ; Deepfake detection ; Faces ; Feature extraction ; Forensics ; Multimedia ; noiseprint ; Nonuniformity ; PRNU ; Streaming media ; Supervised learning ; Training ; Unsupervised learning ; Video ; video clustering</subject><ispartof>IEEE transactions on multimedia, 2023, Vol.25, p.4785-4799</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c291t-54cc154bb2c6590c444f4cb5a96fbd302a39d61cb3600d949d7b205e6a5bebee3</citedby><cites>FETCH-LOGICAL-c291t-54cc154bb2c6590c444f4cb5a96fbd302a39d61cb3600d949d7b205e6a5bebee3</cites><orcidid>0000-0003-3503-8167 ; 0000-0003-4912-2132 ; 0000-0001-9332-5258 ; 0000-0002-3142-3968</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9795231$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,4024,27923,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9795231$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Zhang, Li</creatorcontrib><creatorcontrib>Qiao, Tong</creatorcontrib><creatorcontrib>Xu, Ming</creatorcontrib><creatorcontrib>Zheng, Ning</creatorcontrib><creatorcontrib>Xie, Shichuang</creatorcontrib><title>Unsupervised Learning-Based Framework for Deepfake Video Detection</title><title>IEEE transactions on multimedia</title><addtitle>TMM</addtitle><description>With the continuous development of computer hardware equipment and deep learning technology, it is easier for people to swap faces in videos by currently-emerging multimedia tampering tools, such as the most popular deepfake. It would bring a series of new threats of security. Although many forensic researches have focused on this new type of manipulation and achieved high detection accuracy, most of which are based on supervised learning mechanism with requiring a large number of labeled samples for training. In this paper, we first develop a novel unsupervised detection manner for identifying deepfake videos. The main fundamental behind our proposed method is that the face region in the real video is taken by the camera while its counterpart in the deepfake video is usually generated by the computer; the provenance of two videos is totally different. Specifically, our method includes two clustering stages based on Photo-Response Non-Uniformity (PRNU) and noiseprint feature. Firstly, the PRNU fingerprint of each video frame is extracted, which is used to cluster the full-size identical source video (regardless of its real or fake). Secondly, we extract the noiseprint from the face region of the video, which is used to identify (re-cluster for the task of binary classification) the deepfake sample in each cluster. Numerical experiments verify our proposed unsupervised method performs very well on our own dataset and the benchmark FF++ dataset. More importantly, its performance rivals that of the supervised-based state-of-the-art detectors.</description><subject>Cameras</subject><subject>Clustering</subject><subject>Cyberspace</subject><subject>Datasets</subject><subject>Deception</subject><subject>Deep learning</subject><subject>Deepfake detection</subject><subject>Faces</subject><subject>Feature extraction</subject><subject>Forensics</subject><subject>Multimedia</subject><subject>noiseprint</subject><subject>Nonuniformity</subject><subject>PRNU</subject><subject>Streaming media</subject><subject>Supervised learning</subject><subject>Training</subject><subject>Unsupervised learning</subject><subject>Video</subject><subject>video clustering</subject><issn>1520-9210</issn><issn>1941-0077</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kE1LAzEQhoMoWKt3wcuC562Tr93N0apVocVL6zUk2VnZ1m7WZKv4701p8TTzwvPOwEPINYUJpaDulovFhAFjE04rJkGdkBFVguYAZXmadskgV4zCObmIcQ1AhYRyRKarLu56DN9txDqbowld233kU7OPs2C2-OPDJmt8yB4R-8ZsMHtva_QpDuiG1neX5KwxnxGvjnNMVrOn5cNLPn97fn24n-eOKTrkUjhHpbCWuUIqcEKIRjgrjSoaW3Nghqu6oM7yAqBWQtWlZSCxMNKiReRjcnu42wf_tcM46LXfhS691KyqRKVSUSYKDpQLPsaAje5DuzXhV1PQe1M6mdJ7U_poKlVuDpUWEf9xVSrJOOV_yVRkFQ</recordid><startdate>2023</startdate><enddate>2023</enddate><creator>Zhang, Li</creator><creator>Qiao, Tong</creator><creator>Xu, Ming</creator><creator>Zheng, Ning</creator><creator>Xie, Shichuang</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0003-3503-8167</orcidid><orcidid>https://orcid.org/0000-0003-4912-2132</orcidid><orcidid>https://orcid.org/0000-0001-9332-5258</orcidid><orcidid>https://orcid.org/0000-0002-3142-3968</orcidid></search><sort><creationdate>2023</creationdate><title>Unsupervised Learning-Based Framework for Deepfake Video Detection</title><author>Zhang, Li ; Qiao, Tong ; Xu, Ming ; Zheng, Ning ; Xie, Shichuang</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c291t-54cc154bb2c6590c444f4cb5a96fbd302a39d61cb3600d949d7b205e6a5bebee3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Cameras</topic><topic>Clustering</topic><topic>Cyberspace</topic><topic>Datasets</topic><topic>Deception</topic><topic>Deep learning</topic><topic>Deepfake detection</topic><topic>Faces</topic><topic>Feature extraction</topic><topic>Forensics</topic><topic>Multimedia</topic><topic>noiseprint</topic><topic>Nonuniformity</topic><topic>PRNU</topic><topic>Streaming media</topic><topic>Supervised learning</topic><topic>Training</topic><topic>Unsupervised learning</topic><topic>Video</topic><topic>video clustering</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zhang, Li</creatorcontrib><creatorcontrib>Qiao, Tong</creatorcontrib><creatorcontrib>Xu, Ming</creatorcontrib><creatorcontrib>Zheng, Ning</creatorcontrib><creatorcontrib>Xie, Shichuang</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on multimedia</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Zhang, Li</au><au>Qiao, Tong</au><au>Xu, Ming</au><au>Zheng, Ning</au><au>Xie, Shichuang</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Unsupervised Learning-Based Framework for Deepfake Video Detection</atitle><jtitle>IEEE transactions on multimedia</jtitle><stitle>TMM</stitle><date>2023</date><risdate>2023</risdate><volume>25</volume><spage>4785</spage><epage>4799</epage><pages>4785-4799</pages><issn>1520-9210</issn><eissn>1941-0077</eissn><coden>ITMUF8</coden><abstract>With the continuous development of computer hardware equipment and deep learning technology, it is easier for people to swap faces in videos by currently-emerging multimedia tampering tools, such as the most popular deepfake. It would bring a series of new threats of security. Although many forensic researches have focused on this new type of manipulation and achieved high detection accuracy, most of which are based on supervised learning mechanism with requiring a large number of labeled samples for training. In this paper, we first develop a novel unsupervised detection manner for identifying deepfake videos. The main fundamental behind our proposed method is that the face region in the real video is taken by the camera while its counterpart in the deepfake video is usually generated by the computer; the provenance of two videos is totally different. Specifically, our method includes two clustering stages based on Photo-Response Non-Uniformity (PRNU) and noiseprint feature. Firstly, the PRNU fingerprint of each video frame is extracted, which is used to cluster the full-size identical source video (regardless of its real or fake). Secondly, we extract the noiseprint from the face region of the video, which is used to identify (re-cluster for the task of binary classification) the deepfake sample in each cluster. Numerical experiments verify our proposed unsupervised method performs very well on our own dataset and the benchmark FF++ dataset. More importantly, its performance rivals that of the supervised-based state-of-the-art detectors.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/TMM.2022.3182509</doi><tpages>15</tpages><orcidid>https://orcid.org/0000-0003-3503-8167</orcidid><orcidid>https://orcid.org/0000-0003-4912-2132</orcidid><orcidid>https://orcid.org/0000-0001-9332-5258</orcidid><orcidid>https://orcid.org/0000-0002-3142-3968</orcidid></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 1520-9210
ispartof	IEEE transactions on multimedia, 2023, Vol.25, p.4785-4799
issn	1520-9210 1941-0077
language	eng
recordid	cdi_ieee_primary_9795231
source	IEEE Electronic Library (IEL)
subjects	Cameras Clustering Cyberspace Datasets Deception Deep learning Deepfake detection Faces Feature extraction Forensics Multimedia noiseprint Nonuniformity PRNU Streaming media Supervised learning Training Unsupervised learning Video video clustering
title	Unsupervised Learning-Based Framework for Deepfake Video Detection
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-19T17%3A12%3A43IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Unsupervised%20Learning-Based%20Framework%20for%20Deepfake%20Video%20Detection&rft.jtitle=IEEE%20transactions%20on%20multimedia&rft.au=Zhang,%20Li&rft.date=2023&rft.volume=25&rft.spage=4785&rft.epage=4799&rft.pages=4785-4799&rft.issn=1520-9210&rft.eissn=1941-0077&rft.coden=ITMUF8&rft_id=info:doi/10.1109/TMM.2022.3182509&rft_dat=%3Cproquest_RIE%3E2884893605%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2884893605&rft_id=info:pmid/&rft_ieee_id=9795231&rfr_iscdi=true