Unsupervised Learning-Based Framework for Deepfake Video Detection
With the continuous development of computer hardware equipment and deep learning technology, it is easier for people to swap faces in videos by currently-emerging multimedia tampering tools, such as the most popular deepfake. It would bring a series of new threats of security. Although many forensic...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on multimedia 2023, Vol.25, p.4785-4799 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 4799 |
---|---|
container_issue | |
container_start_page | 4785 |
container_title | IEEE transactions on multimedia |
container_volume | 25 |
creator | Zhang, Li Qiao, Tong Xu, Ming Zheng, Ning Xie, Shichuang |
description | With the continuous development of computer hardware equipment and deep learning technology, it is easier for people to swap faces in videos by currently-emerging multimedia tampering tools, such as the most popular deepfake. It would bring a series of new threats of security. Although many forensic researches have focused on this new type of manipulation and achieved high detection accuracy, most of which are based on supervised learning mechanism with requiring a large number of labeled samples for training. In this paper, we first develop a novel unsupervised detection manner for identifying deepfake videos. The main fundamental behind our proposed method is that the face region in the real video is taken by the camera while its counterpart in the deepfake video is usually generated by the computer; the provenance of two videos is totally different. Specifically, our method includes two clustering stages based on Photo-Response Non-Uniformity (PRNU) and noiseprint feature. Firstly, the PRNU fingerprint of each video frame is extracted, which is used to cluster the full-size identical source video (regardless of its real or fake). Secondly, we extract the noiseprint from the face region of the video, which is used to identify (re-cluster for the task of binary classification) the deepfake sample in each cluster. Numerical experiments verify our proposed unsupervised method performs very well on our own dataset and the benchmark FF++ dataset. More importantly, its performance rivals that of the supervised-based state-of-the-art detectors. |
doi_str_mv | 10.1109/TMM.2022.3182509 |
format | Article |
fullrecord | <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_ieee_primary_9795231</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9795231</ieee_id><sourcerecordid>2884893605</sourcerecordid><originalsourceid>FETCH-LOGICAL-c291t-54cc154bb2c6590c444f4cb5a96fbd302a39d61cb3600d949d7b205e6a5bebee3</originalsourceid><addsrcrecordid>eNo9kE1LAzEQhoMoWKt3wcuC562Tr93N0apVocVL6zUk2VnZ1m7WZKv4701p8TTzwvPOwEPINYUJpaDulovFhAFjE04rJkGdkBFVguYAZXmadskgV4zCObmIcQ1AhYRyRKarLu56DN9txDqbowld233kU7OPs2C2-OPDJmt8yB4R-8ZsMHtva_QpDuiG1neX5KwxnxGvjnNMVrOn5cNLPn97fn24n-eOKTrkUjhHpbCWuUIqcEKIRjgrjSoaW3Nghqu6oM7yAqBWQtWlZSCxMNKiReRjcnu42wf_tcM46LXfhS691KyqRKVSUSYKDpQLPsaAje5DuzXhV1PQe1M6mdJ7U_poKlVuDpUWEf9xVSrJOOV_yVRkFQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2884893605</pqid></control><display><type>article</type><title>Unsupervised Learning-Based Framework for Deepfake Video Detection</title><source>IEEE Electronic Library (IEL)</source><creator>Zhang, Li ; Qiao, Tong ; Xu, Ming ; Zheng, Ning ; Xie, Shichuang</creator><creatorcontrib>Zhang, Li ; Qiao, Tong ; Xu, Ming ; Zheng, Ning ; Xie, Shichuang</creatorcontrib><description>With the continuous development of computer hardware equipment and deep learning technology, it is easier for people to swap faces in videos by currently-emerging multimedia tampering tools, such as the most popular deepfake. It would bring a series of new threats of security. Although many forensic researches have focused on this new type of manipulation and achieved high detection accuracy, most of which are based on supervised learning mechanism with requiring a large number of labeled samples for training. In this paper, we first develop a novel unsupervised detection manner for identifying deepfake videos. The main fundamental behind our proposed method is that the face region in the real video is taken by the camera while its counterpart in the deepfake video is usually generated by the computer; the provenance of two videos is totally different. Specifically, our method includes two clustering stages based on Photo-Response Non-Uniformity (PRNU) and noiseprint feature. Firstly, the PRNU fingerprint of each video frame is extracted, which is used to cluster the full-size identical source video (regardless of its real or fake). Secondly, we extract the noiseprint from the face region of the video, which is used to identify (re-cluster for the task of binary classification) the deepfake sample in each cluster. Numerical experiments verify our proposed unsupervised method performs very well on our own dataset and the benchmark FF++ dataset. More importantly, its performance rivals that of the supervised-based state-of-the-art detectors.</description><identifier>ISSN: 1520-9210</identifier><identifier>EISSN: 1941-0077</identifier><identifier>DOI: 10.1109/TMM.2022.3182509</identifier><identifier>CODEN: ITMUF8</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Cameras ; Clustering ; Cyberspace ; Datasets ; Deception ; Deep learning ; Deepfake detection ; Faces ; Feature extraction ; Forensics ; Multimedia ; noiseprint ; Nonuniformity ; PRNU ; Streaming media ; Supervised learning ; Training ; Unsupervised learning ; Video ; video clustering</subject><ispartof>IEEE transactions on multimedia, 2023, Vol.25, p.4785-4799</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c291t-54cc154bb2c6590c444f4cb5a96fbd302a39d61cb3600d949d7b205e6a5bebee3</citedby><cites>FETCH-LOGICAL-c291t-54cc154bb2c6590c444f4cb5a96fbd302a39d61cb3600d949d7b205e6a5bebee3</cites><orcidid>0000-0003-3503-8167 ; 0000-0003-4912-2132 ; 0000-0001-9332-5258 ; 0000-0002-3142-3968</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9795231$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,4024,27923,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9795231$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Zhang, Li</creatorcontrib><creatorcontrib>Qiao, Tong</creatorcontrib><creatorcontrib>Xu, Ming</creatorcontrib><creatorcontrib>Zheng, Ning</creatorcontrib><creatorcontrib>Xie, Shichuang</creatorcontrib><title>Unsupervised Learning-Based Framework for Deepfake Video Detection</title><title>IEEE transactions on multimedia</title><addtitle>TMM</addtitle><description>With the continuous development of computer hardware equipment and deep learning technology, it is easier for people to swap faces in videos by currently-emerging multimedia tampering tools, such as the most popular deepfake. It would bring a series of new threats of security. Although many forensic researches have focused on this new type of manipulation and achieved high detection accuracy, most of which are based on supervised learning mechanism with requiring a large number of labeled samples for training. In this paper, we first develop a novel unsupervised detection manner for identifying deepfake videos. The main fundamental behind our proposed method is that the face region in the real video is taken by the camera while its counterpart in the deepfake video is usually generated by the computer; the provenance of two videos is totally different. Specifically, our method includes two clustering stages based on Photo-Response Non-Uniformity (PRNU) and noiseprint feature. Firstly, the PRNU fingerprint of each video frame is extracted, which is used to cluster the full-size identical source video (regardless of its real or fake). Secondly, we extract the noiseprint from the face region of the video, which is used to identify (re-cluster for the task of binary classification) the deepfake sample in each cluster. Numerical experiments verify our proposed unsupervised method performs very well on our own dataset and the benchmark FF++ dataset. More importantly, its performance rivals that of the supervised-based state-of-the-art detectors.</description><subject>Cameras</subject><subject>Clustering</subject><subject>Cyberspace</subject><subject>Datasets</subject><subject>Deception</subject><subject>Deep learning</subject><subject>Deepfake detection</subject><subject>Faces</subject><subject>Feature extraction</subject><subject>Forensics</subject><subject>Multimedia</subject><subject>noiseprint</subject><subject>Nonuniformity</subject><subject>PRNU</subject><subject>Streaming media</subject><subject>Supervised learning</subject><subject>Training</subject><subject>Unsupervised learning</subject><subject>Video</subject><subject>video clustering</subject><issn>1520-9210</issn><issn>1941-0077</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kE1LAzEQhoMoWKt3wcuC562Tr93N0apVocVL6zUk2VnZ1m7WZKv4701p8TTzwvPOwEPINYUJpaDulovFhAFjE04rJkGdkBFVguYAZXmadskgV4zCObmIcQ1AhYRyRKarLu56DN9txDqbowld233kU7OPs2C2-OPDJmt8yB4R-8ZsMHtva_QpDuiG1neX5KwxnxGvjnNMVrOn5cNLPn97fn24n-eOKTrkUjhHpbCWuUIqcEKIRjgrjSoaW3Nghqu6oM7yAqBWQtWlZSCxMNKiReRjcnu42wf_tcM46LXfhS691KyqRKVSUSYKDpQLPsaAje5DuzXhV1PQe1M6mdJ7U_poKlVuDpUWEf9xVSrJOOV_yVRkFQ</recordid><startdate>2023</startdate><enddate>2023</enddate><creator>Zhang, Li</creator><creator>Qiao, Tong</creator><creator>Xu, Ming</creator><creator>Zheng, Ning</creator><creator>Xie, Shichuang</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0003-3503-8167</orcidid><orcidid>https://orcid.org/0000-0003-4912-2132</orcidid><orcidid>https://orcid.org/0000-0001-9332-5258</orcidid><orcidid>https://orcid.org/0000-0002-3142-3968</orcidid></search><sort><creationdate>2023</creationdate><title>Unsupervised Learning-Based Framework for Deepfake Video Detection</title><author>Zhang, Li ; Qiao, Tong ; Xu, Ming ; Zheng, Ning ; Xie, Shichuang</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c291t-54cc154bb2c6590c444f4cb5a96fbd302a39d61cb3600d949d7b205e6a5bebee3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Cameras</topic><topic>Clustering</topic><topic>Cyberspace</topic><topic>Datasets</topic><topic>Deception</topic><topic>Deep learning</topic><topic>Deepfake detection</topic><topic>Faces</topic><topic>Feature extraction</topic><topic>Forensics</topic><topic>Multimedia</topic><topic>noiseprint</topic><topic>Nonuniformity</topic><topic>PRNU</topic><topic>Streaming media</topic><topic>Supervised learning</topic><topic>Training</topic><topic>Unsupervised learning</topic><topic>Video</topic><topic>video clustering</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zhang, Li</creatorcontrib><creatorcontrib>Qiao, Tong</creatorcontrib><creatorcontrib>Xu, Ming</creatorcontrib><creatorcontrib>Zheng, Ning</creatorcontrib><creatorcontrib>Xie, Shichuang</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on multimedia</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Zhang, Li</au><au>Qiao, Tong</au><au>Xu, Ming</au><au>Zheng, Ning</au><au>Xie, Shichuang</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Unsupervised Learning-Based Framework for Deepfake Video Detection</atitle><jtitle>IEEE transactions on multimedia</jtitle><stitle>TMM</stitle><date>2023</date><risdate>2023</risdate><volume>25</volume><spage>4785</spage><epage>4799</epage><pages>4785-4799</pages><issn>1520-9210</issn><eissn>1941-0077</eissn><coden>ITMUF8</coden><abstract>With the continuous development of computer hardware equipment and deep learning technology, it is easier for people to swap faces in videos by currently-emerging multimedia tampering tools, such as the most popular deepfake. It would bring a series of new threats of security. Although many forensic researches have focused on this new type of manipulation and achieved high detection accuracy, most of which are based on supervised learning mechanism with requiring a large number of labeled samples for training. In this paper, we first develop a novel unsupervised detection manner for identifying deepfake videos. The main fundamental behind our proposed method is that the face region in the real video is taken by the camera while its counterpart in the deepfake video is usually generated by the computer; the provenance of two videos is totally different. Specifically, our method includes two clustering stages based on Photo-Response Non-Uniformity (PRNU) and noiseprint feature. Firstly, the PRNU fingerprint of each video frame is extracted, which is used to cluster the full-size identical source video (regardless of its real or fake). Secondly, we extract the noiseprint from the face region of the video, which is used to identify (re-cluster for the task of binary classification) the deepfake sample in each cluster. Numerical experiments verify our proposed unsupervised method performs very well on our own dataset and the benchmark FF++ dataset. More importantly, its performance rivals that of the supervised-based state-of-the-art detectors.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/TMM.2022.3182509</doi><tpages>15</tpages><orcidid>https://orcid.org/0000-0003-3503-8167</orcidid><orcidid>https://orcid.org/0000-0003-4912-2132</orcidid><orcidid>https://orcid.org/0000-0001-9332-5258</orcidid><orcidid>https://orcid.org/0000-0002-3142-3968</orcidid></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 1520-9210 |
ispartof | IEEE transactions on multimedia, 2023, Vol.25, p.4785-4799 |
issn | 1520-9210 1941-0077 |
language | eng |
recordid | cdi_ieee_primary_9795231 |
source | IEEE Electronic Library (IEL) |
subjects | Cameras Clustering Cyberspace Datasets Deception Deep learning Deepfake detection Faces Feature extraction Forensics Multimedia noiseprint Nonuniformity PRNU Streaming media Supervised learning Training Unsupervised learning Video video clustering |
title | Unsupervised Learning-Based Framework for Deepfake Video Detection |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-19T17%3A12%3A43IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Unsupervised%20Learning-Based%20Framework%20for%20Deepfake%20Video%20Detection&rft.jtitle=IEEE%20transactions%20on%20multimedia&rft.au=Zhang,%20Li&rft.date=2023&rft.volume=25&rft.spage=4785&rft.epage=4799&rft.pages=4785-4799&rft.issn=1520-9210&rft.eissn=1941-0077&rft.coden=ITMUF8&rft_id=info:doi/10.1109/TMM.2022.3182509&rft_dat=%3Cproquest_RIE%3E2884893605%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2884893605&rft_id=info:pmid/&rft_ieee_id=9795231&rfr_iscdi=true |