A Multimodal Framework for Deepfake Detection

The rapid advancement of deepfake technology poses a significant threat to digital media integrity. Deepfakes, synthetic media created using AI, can convincingly alter videos and audio to misrepresent reality. This creates risks of misinformation, fraud, and severe implications for personal privacy...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:arXiv.org 2024-10
Hauptverfasser: Gandhi, Kashish, Kulkarni, Prutha, Shah, Taran, Chaudhari, Piyush, Narvekar, Meera, Ghag, Kranti
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Gandhi, Kashish
Kulkarni, Prutha
Shah, Taran
Chaudhari, Piyush
Narvekar, Meera
Ghag, Kranti
description The rapid advancement of deepfake technology poses a significant threat to digital media integrity. Deepfakes, synthetic media created using AI, can convincingly alter videos and audio to misrepresent reality. This creates risks of misinformation, fraud, and severe implications for personal privacy and security. Our research addresses the critical issue of deepfakes through an innovative multimodal approach, targeting both visual and auditory elements. This comprehensive strategy recognizes that human perception integrates multiple sensory inputs, particularly visual and auditory information, to form a complete understanding of media content. For visual analysis, a model that employs advanced feature extraction techniques was developed, extracting nine distinct facial characteristics and then applying various machine learning and deep learning models. For auditory analysis, our model leverages mel-spectrogram analysis for feature extraction and then applies various machine learning and deep learningmodels. To achieve a combined analysis, real and deepfake audio in the original dataset were swapped for testing purposes and ensured balanced samples. Using our proposed models for video and audio classification i.e. Artificial Neural Network and VGG19, the overall sample is classified as deepfake if either component is identified as such. Our multimodal framework combines visual and auditory analyses, yielding an accuracy of 94%.
doi_str_mv 10.48550/arxiv.2410.03487
format Article
fullrecord <record><control><sourceid>proquest_arxiv</sourceid><recordid>TN_cdi_arxiv_primary_2410_03487</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3113848005</sourcerecordid><originalsourceid>FETCH-LOGICAL-a525-26195785da44586717b70e20108e34b234bdf4a8d7f75ed158b0e2df94e4a4d43</originalsourceid><addsrcrecordid>eNotj81OwzAQhC0kpFalD8CJSJxT1vZu7B6rQgGpiEvvkVPbUtqkDk7Cz9tjWg6rWc2MVvsxdsthgZoIHkz8rj8XApMBErW6YlMhJc81CjFh874_AIAolCCSU5avsrexGeo2WNNkm2ha9xXiMfMhZo_Odd4cXVoGtx_qcLph1940vZv_64ztNk-79Uu-fX9-Xa-2uSFBuSj4kpQmaxBJF4qrSoETwEE7iZVIYz0abZVX5CwnXaXY-iU6NGhRztjd5eyZpexi3Zr4U_4xlWem1Li_NLoYPkbXD-UhjPGUfiol51KjBiD5CwahTKI</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3113848005</pqid></control><display><type>article</type><title>A Multimodal Framework for Deepfake Detection</title><source>arXiv.org</source><source>Free E- Journals</source><creator>Gandhi, Kashish ; Kulkarni, Prutha ; Shah, Taran ; Chaudhari, Piyush ; Narvekar, Meera ; Ghag, Kranti</creator><creatorcontrib>Gandhi, Kashish ; Kulkarni, Prutha ; Shah, Taran ; Chaudhari, Piyush ; Narvekar, Meera ; Ghag, Kranti</creatorcontrib><description>The rapid advancement of deepfake technology poses a significant threat to digital media integrity. Deepfakes, synthetic media created using AI, can convincingly alter videos and audio to misrepresent reality. This creates risks of misinformation, fraud, and severe implications for personal privacy and security. Our research addresses the critical issue of deepfakes through an innovative multimodal approach, targeting both visual and auditory elements. This comprehensive strategy recognizes that human perception integrates multiple sensory inputs, particularly visual and auditory information, to form a complete understanding of media content. For visual analysis, a model that employs advanced feature extraction techniques was developed, extracting nine distinct facial characteristics and then applying various machine learning and deep learning models. For auditory analysis, our model leverages mel-spectrogram analysis for feature extraction and then applies various machine learning and deep learningmodels. To achieve a combined analysis, real and deepfake audio in the original dataset were swapped for testing purposes and ensured balanced samples. Using our proposed models for video and audio classification i.e. Artificial Neural Network and VGG19, the overall sample is classified as deepfake if either component is identified as such. Our multimodal framework combines visual and auditory analyses, yielding an accuracy of 94%.</description><identifier>EISSN: 2331-8422</identifier><identifier>DOI: 10.48550/arxiv.2410.03487</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Artificial neural networks ; Audio data ; Computer Science - Artificial Intelligence ; Computer Science - Computer Vision and Pattern Recognition ; Computer Science - Learning ; Computer Science - Logic in Computer Science ; Deception ; Deep learning ; Deepfake ; Feature extraction ; Machine learning</subject><ispartof>arXiv.org, 2024-10</ispartof><rights>2024. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,780,881,27902</link.rule.ids><backlink>$$Uhttps://doi.org/10.48550/arXiv.2410.03487$$DView paper in arXiv$$Hfree_for_read</backlink><backlink>$$Uhttps://doi.org/10.53555/jes.v20i10s.6126$$DView published paper (Access to full text may be restricted)$$Hfree_for_read</backlink></links><search><creatorcontrib>Gandhi, Kashish</creatorcontrib><creatorcontrib>Kulkarni, Prutha</creatorcontrib><creatorcontrib>Shah, Taran</creatorcontrib><creatorcontrib>Chaudhari, Piyush</creatorcontrib><creatorcontrib>Narvekar, Meera</creatorcontrib><creatorcontrib>Ghag, Kranti</creatorcontrib><title>A Multimodal Framework for Deepfake Detection</title><title>arXiv.org</title><description>The rapid advancement of deepfake technology poses a significant threat to digital media integrity. Deepfakes, synthetic media created using AI, can convincingly alter videos and audio to misrepresent reality. This creates risks of misinformation, fraud, and severe implications for personal privacy and security. Our research addresses the critical issue of deepfakes through an innovative multimodal approach, targeting both visual and auditory elements. This comprehensive strategy recognizes that human perception integrates multiple sensory inputs, particularly visual and auditory information, to form a complete understanding of media content. For visual analysis, a model that employs advanced feature extraction techniques was developed, extracting nine distinct facial characteristics and then applying various machine learning and deep learning models. For auditory analysis, our model leverages mel-spectrogram analysis for feature extraction and then applies various machine learning and deep learningmodels. To achieve a combined analysis, real and deepfake audio in the original dataset were swapped for testing purposes and ensured balanced samples. Using our proposed models for video and audio classification i.e. Artificial Neural Network and VGG19, the overall sample is classified as deepfake if either component is identified as such. Our multimodal framework combines visual and auditory analyses, yielding an accuracy of 94%.</description><subject>Artificial neural networks</subject><subject>Audio data</subject><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Computer Vision and Pattern Recognition</subject><subject>Computer Science - Learning</subject><subject>Computer Science - Logic in Computer Science</subject><subject>Deception</subject><subject>Deep learning</subject><subject>Deepfake</subject><subject>Feature extraction</subject><subject>Machine learning</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><sourceid>GOX</sourceid><recordid>eNotj81OwzAQhC0kpFalD8CJSJxT1vZu7B6rQgGpiEvvkVPbUtqkDk7Cz9tjWg6rWc2MVvsxdsthgZoIHkz8rj8XApMBErW6YlMhJc81CjFh874_AIAolCCSU5avsrexGeo2WNNkm2ha9xXiMfMhZo_Odd4cXVoGtx_qcLph1940vZv_64ztNk-79Uu-fX9-Xa-2uSFBuSj4kpQmaxBJF4qrSoETwEE7iZVIYz0abZVX5CwnXaXY-iU6NGhRztjd5eyZpexi3Zr4U_4xlWem1Li_NLoYPkbXD-UhjPGUfiol51KjBiD5CwahTKI</recordid><startdate>20241004</startdate><enddate>20241004</enddate><creator>Gandhi, Kashish</creator><creator>Kulkarni, Prutha</creator><creator>Shah, Taran</creator><creator>Chaudhari, Piyush</creator><creator>Narvekar, Meera</creator><creator>Ghag, Kranti</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20241004</creationdate><title>A Multimodal Framework for Deepfake Detection</title><author>Gandhi, Kashish ; Kulkarni, Prutha ; Shah, Taran ; Chaudhari, Piyush ; Narvekar, Meera ; Ghag, Kranti</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a525-26195785da44586717b70e20108e34b234bdf4a8d7f75ed158b0e2df94e4a4d43</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Artificial neural networks</topic><topic>Audio data</topic><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Computer Vision and Pattern Recognition</topic><topic>Computer Science - Learning</topic><topic>Computer Science - Logic in Computer Science</topic><topic>Deception</topic><topic>Deep learning</topic><topic>Deepfake</topic><topic>Feature extraction</topic><topic>Machine learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Gandhi, Kashish</creatorcontrib><creatorcontrib>Kulkarni, Prutha</creatorcontrib><creatorcontrib>Shah, Taran</creatorcontrib><creatorcontrib>Chaudhari, Piyush</creatorcontrib><creatorcontrib>Narvekar, Meera</creatorcontrib><creatorcontrib>Ghag, Kranti</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection><collection>arXiv Computer Science</collection><collection>arXiv.org</collection><jtitle>arXiv.org</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Gandhi, Kashish</au><au>Kulkarni, Prutha</au><au>Shah, Taran</au><au>Chaudhari, Piyush</au><au>Narvekar, Meera</au><au>Ghag, Kranti</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A Multimodal Framework for Deepfake Detection</atitle><jtitle>arXiv.org</jtitle><date>2024-10-04</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>The rapid advancement of deepfake technology poses a significant threat to digital media integrity. Deepfakes, synthetic media created using AI, can convincingly alter videos and audio to misrepresent reality. This creates risks of misinformation, fraud, and severe implications for personal privacy and security. Our research addresses the critical issue of deepfakes through an innovative multimodal approach, targeting both visual and auditory elements. This comprehensive strategy recognizes that human perception integrates multiple sensory inputs, particularly visual and auditory information, to form a complete understanding of media content. For visual analysis, a model that employs advanced feature extraction techniques was developed, extracting nine distinct facial characteristics and then applying various machine learning and deep learning models. For auditory analysis, our model leverages mel-spectrogram analysis for feature extraction and then applies various machine learning and deep learningmodels. To achieve a combined analysis, real and deepfake audio in the original dataset were swapped for testing purposes and ensured balanced samples. Using our proposed models for video and audio classification i.e. Artificial Neural Network and VGG19, the overall sample is classified as deepfake if either component is identified as such. Our multimodal framework combines visual and auditory analyses, yielding an accuracy of 94%.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><doi>10.48550/arxiv.2410.03487</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2024-10
issn 2331-8422
language eng
recordid cdi_arxiv_primary_2410_03487
source arXiv.org; Free E- Journals
subjects Artificial neural networks
Audio data
Computer Science - Artificial Intelligence
Computer Science - Computer Vision and Pattern Recognition
Computer Science - Learning
Computer Science - Logic in Computer Science
Deception
Deep learning
Deepfake
Feature extraction
Machine learning
title A Multimodal Framework for Deepfake Detection
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-28T16%3A57%3A54IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_arxiv&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20Multimodal%20Framework%20for%20Deepfake%20Detection&rft.jtitle=arXiv.org&rft.au=Gandhi,%20Kashish&rft.date=2024-10-04&rft.eissn=2331-8422&rft_id=info:doi/10.48550/arxiv.2410.03487&rft_dat=%3Cproquest_arxiv%3E3113848005%3C/proquest_arxiv%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3113848005&rft_id=info:pmid/&rfr_iscdi=true