A Multimodal Framework for Deepfake Detection

The rapid advancement of deepfake technology poses a significant threat to digital media integrity. Deepfakes, synthetic media created using AI, can convincingly alter videos and audio to misrepresent reality. This creates risks of misinformation, fraud, and severe implications for personal privacy...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2024-10
Hauptverfasser:	Gandhi, Kashish, Kulkarni, Prutha, Shah, Taran, Chaudhari, Piyush, Narvekar, Meera, Ghag, Kranti
Format:	Artikel
Sprache:	eng
Schlagworte:	Artificial neural networks Audio data Computer Science - Artificial Intelligence Computer Science - Computer Vision and Pattern Recognition Computer Science - Learning Computer Science - Logic in Computer Science Deception Deep learning Deepfake Feature extraction Machine learning
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Gandhi, Kashish Kulkarni, Prutha Shah, Taran Chaudhari, Piyush Narvekar, Meera Ghag, Kranti
description	The rapid advancement of deepfake technology poses a significant threat to digital media integrity. Deepfakes, synthetic media created using AI, can convincingly alter videos and audio to misrepresent reality. This creates risks of misinformation, fraud, and severe implications for personal privacy and security. Our research addresses the critical issue of deepfakes through an innovative multimodal approach, targeting both visual and auditory elements. This comprehensive strategy recognizes that human perception integrates multiple sensory inputs, particularly visual and auditory information, to form a complete understanding of media content. For visual analysis, a model that employs advanced feature extraction techniques was developed, extracting nine distinct facial characteristics and then applying various machine learning and deep learning models. For auditory analysis, our model leverages mel-spectrogram analysis for feature extraction and then applies various machine learning and deep learningmodels. To achieve a combined analysis, real and deepfake audio in the original dataset were swapped for testing purposes and ensured balanced samples. Using our proposed models for video and audio classification i.e. Artificial Neural Network and VGG19, the overall sample is classified as deepfake if either component is identified as such. Our multimodal framework combines visual and auditory analyses, yielding an accuracy of 94%.
doi_str_mv	10.48550/arxiv.2410.03487
format	Article
fullrecord	<record><control><sourceid>proquest_arxiv</sourceid><recordid>TN_cdi_arxiv_primary_2410_03487</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3113848005</sourcerecordid><originalsourceid>FETCH-LOGICAL-a525-26195785da44586717b70e20108e34b234bdf4a8d7f75ed158b0e2df94e4a4d43</originalsourceid><addsrcrecordid>eNotj81OwzAQhC0kpFalD8CJSJxT1vZu7B6rQgGpiEvvkVPbUtqkDk7Cz9tjWg6rWc2MVvsxdsthgZoIHkz8rj8XApMBErW6YlMhJc81CjFh874_AIAolCCSU5avsrexGeo2WNNkm2ha9xXiMfMhZo_Odd4cXVoGtx_qcLph1940vZv_64ztNk-79Uu-fX9-Xa-2uSFBuSj4kpQmaxBJF4qrSoETwEE7iZVIYz0abZVX5CwnXaXY-iU6NGhRztjd5eyZpexi3Zr4U_4xlWem1Li_NLoYPkbXD-UhjPGUfiol51KjBiD5CwahTKI</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3113848005</pqid></control><display><type>article</type><title>A Multimodal Framework for Deepfake Detection</title><source>arXiv.org</source><source>Free E- Journals</source><creator>Gandhi, Kashish ; Kulkarni, Prutha ; Shah, Taran ; Chaudhari, Piyush ; Narvekar, Meera ; Ghag, Kranti</creator><creatorcontrib>Gandhi, Kashish ; Kulkarni, Prutha ; Shah, Taran ; Chaudhari, Piyush ; Narvekar, Meera ; Ghag, Kranti</creatorcontrib><description>The rapid advancement of deepfake technology poses a significant threat to digital media integrity. Deepfakes, synthetic media created using AI, can convincingly alter videos and audio to misrepresent reality. This creates risks of misinformation, fraud, and severe implications for personal privacy and security. Our research addresses the critical issue of deepfakes through an innovative multimodal approach, targeting both visual and auditory elements. This comprehensive strategy recognizes that human perception integrates multiple sensory inputs, particularly visual and auditory information, to form a complete understanding of media content. For visual analysis, a model that employs advanced feature extraction techniques was developed, extracting nine distinct facial characteristics and then applying various machine learning and deep learning models. For auditory analysis, our model leverages mel-spectrogram analysis for feature extraction and then applies various machine learning and deep learningmodels. To achieve a combined analysis, real and deepfake audio in the original dataset were swapped for testing purposes and ensured balanced samples. Using our proposed models for video and audio classification i.e. Artificial Neural Network and VGG19, the overall sample is classified as deepfake if either component is identified as such. Our multimodal framework combines visual and auditory analyses, yielding an accuracy of 94%.</description><identifier>EISSN: 2331-8422</identifier><identifier>DOI: 10.48550/arxiv.2410.03487</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Artificial neural networks ; Audio data ; Computer Science - Artificial Intelligence ; Computer Science - Computer Vision and Pattern Recognition ; Computer Science - Learning ; Computer Science - Logic in Computer Science ; Deception ; Deep learning ; Deepfake ; Feature extraction ; Machine learning</subject><ispartof>arXiv.org, 2024-10</ispartof><rights>2024. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,780,881,27902</link.rule.ids><backlink>$$Uhttps://doi.org/10.48550/arXiv.2410.03487$$DView paper in arXiv$$Hfree_for_read</backlink><backlink>$$Uhttps://doi.org/10.53555/jes.v20i10s.6126$$DView published paper (Access to full text may be restricted)$$Hfree_for_read</backlink></links><search><creatorcontrib>Gandhi, Kashish</creatorcontrib><creatorcontrib>Kulkarni, Prutha</creatorcontrib><creatorcontrib>Shah, Taran</creatorcontrib><creatorcontrib>Chaudhari, Piyush</creatorcontrib><creatorcontrib>Narvekar, Meera</creatorcontrib><creatorcontrib>Ghag, Kranti</creatorcontrib><title>A Multimodal Framework for Deepfake Detection</title><title>arXiv.org</title><description>The rapid advancement of deepfake technology poses a significant threat to digital media integrity. Deepfakes, synthetic media created using AI, can convincingly alter videos and audio to misrepresent reality. This creates risks of misinformation, fraud, and severe implications for personal privacy and security. Our research addresses the critical issue of deepfakes through an innovative multimodal approach, targeting both visual and auditory elements. This comprehensive strategy recognizes that human perception integrates multiple sensory inputs, particularly visual and auditory information, to form a complete understanding of media content. For visual analysis, a model that employs advanced feature extraction techniques was developed, extracting nine distinct facial characteristics and then applying various machine learning and deep learning models. For auditory analysis, our model leverages mel-spectrogram analysis for feature extraction and then applies various machine learning and deep learningmodels. To achieve a combined analysis, real and deepfake audio in the original dataset were swapped for testing purposes and ensured balanced samples. Using our proposed models for video and audio classification i.e. Artificial Neural Network and VGG19, the overall sample is classified as deepfake if either component is identified as such. Our multimodal framework combines visual and auditory analyses, yielding an accuracy of 94%.</description><subject>Artificial neural networks</subject><subject>Audio data</subject><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Computer Vision and Pattern Recognition</subject><subject>Computer Science - Learning</subject><subject>Computer Science - Logic in Computer Science</subject><subject>Deception</subject><subject>Deep learning</subject><subject>Deepfake</subject><subject>Feature extraction</subject><subject>Machine learning</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><sourceid>GOX</sourceid><recordid>eNotj81OwzAQhC0kpFalD8CJSJxT1vZu7B6rQgGpiEvvkVPbUtqkDk7Cz9tjWg6rWc2MVvsxdsthgZoIHkz8rj8XApMBErW6YlMhJc81CjFh874_AIAolCCSU5avsrexGeo2WNNkm2ha9xXiMfMhZo_Odd4cXVoGtx_qcLph1940vZv_64ztNk-79Uu-fX9-Xa-2uSFBuSj4kpQmaxBJF4qrSoETwEE7iZVIYz0abZVX5CwnXaXY-iU6NGhRztjd5eyZpexi3Zr4U_4xlWem1Li_NLoYPkbXD-UhjPGUfiol51KjBiD5CwahTKI</recordid><startdate>20241004</startdate><enddate>20241004</enddate><creator>Gandhi, Kashish</creator><creator>Kulkarni, Prutha</creator><creator>Shah, Taran</creator><creator>Chaudhari, Piyush</creator><creator>Narvekar, Meera</creator><creator>Ghag, Kranti</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20241004</creationdate><title>A Multimodal Framework for Deepfake Detection</title><author>Gandhi, Kashish ; Kulkarni, Prutha ; Shah, Taran ; Chaudhari, Piyush ; Narvekar, Meera ; Ghag, Kranti</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a525-26195785da44586717b70e20108e34b234bdf4a8d7f75ed158b0e2df94e4a4d43</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Artificial neural networks</topic><topic>Audio data</topic><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Computer Vision and Pattern Recognition</topic><topic>Computer Science - Learning</topic><topic>Computer Science - Logic in Computer Science</topic><topic>Deception</topic><topic>Deep learning</topic><topic>Deepfake</topic><topic>Feature extraction</topic><topic>Machine learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Gandhi, Kashish</creatorcontrib><creatorcontrib>Kulkarni, Prutha</creatorcontrib><creatorcontrib>Shah, Taran</creatorcontrib><creatorcontrib>Chaudhari, Piyush</creatorcontrib><creatorcontrib>Narvekar, Meera</creatorcontrib><creatorcontrib>Ghag, Kranti</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection><collection>arXiv Computer Science</collection><collection>arXiv.org</collection><jtitle>arXiv.org</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Gandhi, Kashish</au><au>Kulkarni, Prutha</au><au>Shah, Taran</au><au>Chaudhari, Piyush</au><au>Narvekar, Meera</au><au>Ghag, Kranti</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A Multimodal Framework for Deepfake Detection</atitle><jtitle>arXiv.org</jtitle><date>2024-10-04</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>The rapid advancement of deepfake technology poses a significant threat to digital media integrity. Deepfakes, synthetic media created using AI, can convincingly alter videos and audio to misrepresent reality. This creates risks of misinformation, fraud, and severe implications for personal privacy and security. Our research addresses the critical issue of deepfakes through an innovative multimodal approach, targeting both visual and auditory elements. This comprehensive strategy recognizes that human perception integrates multiple sensory inputs, particularly visual and auditory information, to form a complete understanding of media content. For visual analysis, a model that employs advanced feature extraction techniques was developed, extracting nine distinct facial characteristics and then applying various machine learning and deep learning models. For auditory analysis, our model leverages mel-spectrogram analysis for feature extraction and then applies various machine learning and deep learningmodels. To achieve a combined analysis, real and deepfake audio in the original dataset were swapped for testing purposes and ensured balanced samples. Using our proposed models for video and audio classification i.e. Artificial Neural Network and VGG19, the overall sample is classified as deepfake if either component is identified as such. Our multimodal framework combines visual and auditory analyses, yielding an accuracy of 94%.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><doi>10.48550/arxiv.2410.03487</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2024-10
issn	2331-8422
language	eng
recordid	cdi_arxiv_primary_2410_03487
source	arXiv.org; Free E- Journals
subjects	Artificial neural networks Audio data Computer Science - Artificial Intelligence Computer Science - Computer Vision and Pattern Recognition Computer Science - Learning Computer Science - Logic in Computer Science Deception Deep learning Deepfake Feature extraction Machine learning
title	A Multimodal Framework for Deepfake Detection
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-28T16%3A57%3A54IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_arxiv&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20Multimodal%20Framework%20for%20Deepfake%20Detection&rft.jtitle=arXiv.org&rft.au=Gandhi,%20Kashish&rft.date=2024-10-04&rft.eissn=2331-8422&rft_id=info:doi/10.48550/arxiv.2410.03487&rft_dat=%3Cproquest_arxiv%3E3113848005%3C/proquest_arxiv%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3113848005&rft_id=info:pmid/&rfr_iscdi=true