A Multimodal Framework for Deepfake Detection
The rapid advancement of deepfake technology poses a significant threat to digital media integrity. Deepfakes, synthetic media created using AI, can convincingly alter videos and audio to misrepresent reality. This creates risks of misinformation, fraud, and severe implications for personal privacy...
Gespeichert in:
Veröffentlicht in: | arXiv.org 2024-10 |
---|---|
Hauptverfasser: | , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | arXiv.org |
container_volume | |
creator | Gandhi, Kashish Kulkarni, Prutha Shah, Taran Chaudhari, Piyush Narvekar, Meera Ghag, Kranti |
description | The rapid advancement of deepfake technology poses a significant threat to digital media integrity. Deepfakes, synthetic media created using AI, can convincingly alter videos and audio to misrepresent reality. This creates risks of misinformation, fraud, and severe implications for personal privacy and security. Our research addresses the critical issue of deepfakes through an innovative multimodal approach, targeting both visual and auditory elements. This comprehensive strategy recognizes that human perception integrates multiple sensory inputs, particularly visual and auditory information, to form a complete understanding of media content. For visual analysis, a model that employs advanced feature extraction techniques was developed, extracting nine distinct facial characteristics and then applying various machine learning and deep learning models. For auditory analysis, our model leverages mel-spectrogram analysis for feature extraction and then applies various machine learning and deep learningmodels. To achieve a combined analysis, real and deepfake audio in the original dataset were swapped for testing purposes and ensured balanced samples. Using our proposed models for video and audio classification i.e. Artificial Neural Network and VGG19, the overall sample is classified as deepfake if either component is identified as such. Our multimodal framework combines visual and auditory analyses, yielding an accuracy of 94%. |
doi_str_mv | 10.48550/arxiv.2410.03487 |
format | Article |
fullrecord | <record><control><sourceid>proquest_arxiv</sourceid><recordid>TN_cdi_arxiv_primary_2410_03487</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3113848005</sourcerecordid><originalsourceid>FETCH-LOGICAL-a525-26195785da44586717b70e20108e34b234bdf4a8d7f75ed158b0e2df94e4a4d43</originalsourceid><addsrcrecordid>eNotj81OwzAQhC0kpFalD8CJSJxT1vZu7B6rQgGpiEvvkVPbUtqkDk7Cz9tjWg6rWc2MVvsxdsthgZoIHkz8rj8XApMBErW6YlMhJc81CjFh874_AIAolCCSU5avsrexGeo2WNNkm2ha9xXiMfMhZo_Odd4cXVoGtx_qcLph1940vZv_64ztNk-79Uu-fX9-Xa-2uSFBuSj4kpQmaxBJF4qrSoETwEE7iZVIYz0abZVX5CwnXaXY-iU6NGhRztjd5eyZpexi3Zr4U_4xlWem1Li_NLoYPkbXD-UhjPGUfiol51KjBiD5CwahTKI</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3113848005</pqid></control><display><type>article</type><title>A Multimodal Framework for Deepfake Detection</title><source>arXiv.org</source><source>Free E- Journals</source><creator>Gandhi, Kashish ; Kulkarni, Prutha ; Shah, Taran ; Chaudhari, Piyush ; Narvekar, Meera ; Ghag, Kranti</creator><creatorcontrib>Gandhi, Kashish ; Kulkarni, Prutha ; Shah, Taran ; Chaudhari, Piyush ; Narvekar, Meera ; Ghag, Kranti</creatorcontrib><description>The rapid advancement of deepfake technology poses a significant threat to digital media integrity. Deepfakes, synthetic media created using AI, can convincingly alter videos and audio to misrepresent reality. This creates risks of misinformation, fraud, and severe implications for personal privacy and security. Our research addresses the critical issue of deepfakes through an innovative multimodal approach, targeting both visual and auditory elements. This comprehensive strategy recognizes that human perception integrates multiple sensory inputs, particularly visual and auditory information, to form a complete understanding of media content. For visual analysis, a model that employs advanced feature extraction techniques was developed, extracting nine distinct facial characteristics and then applying various machine learning and deep learning models. For auditory analysis, our model leverages mel-spectrogram analysis for feature extraction and then applies various machine learning and deep learningmodels. To achieve a combined analysis, real and deepfake audio in the original dataset were swapped for testing purposes and ensured balanced samples. Using our proposed models for video and audio classification i.e. Artificial Neural Network and VGG19, the overall sample is classified as deepfake if either component is identified as such. Our multimodal framework combines visual and auditory analyses, yielding an accuracy of 94%.</description><identifier>EISSN: 2331-8422</identifier><identifier>DOI: 10.48550/arxiv.2410.03487</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Artificial neural networks ; Audio data ; Computer Science - Artificial Intelligence ; Computer Science - Computer Vision and Pattern Recognition ; Computer Science - Learning ; Computer Science - Logic in Computer Science ; Deception ; Deep learning ; Deepfake ; Feature extraction ; Machine learning</subject><ispartof>arXiv.org, 2024-10</ispartof><rights>2024. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,780,881,27902</link.rule.ids><backlink>$$Uhttps://doi.org/10.48550/arXiv.2410.03487$$DView paper in arXiv$$Hfree_for_read</backlink><backlink>$$Uhttps://doi.org/10.53555/jes.v20i10s.6126$$DView published paper (Access to full text may be restricted)$$Hfree_for_read</backlink></links><search><creatorcontrib>Gandhi, Kashish</creatorcontrib><creatorcontrib>Kulkarni, Prutha</creatorcontrib><creatorcontrib>Shah, Taran</creatorcontrib><creatorcontrib>Chaudhari, Piyush</creatorcontrib><creatorcontrib>Narvekar, Meera</creatorcontrib><creatorcontrib>Ghag, Kranti</creatorcontrib><title>A Multimodal Framework for Deepfake Detection</title><title>arXiv.org</title><description>The rapid advancement of deepfake technology poses a significant threat to digital media integrity. Deepfakes, synthetic media created using AI, can convincingly alter videos and audio to misrepresent reality. This creates risks of misinformation, fraud, and severe implications for personal privacy and security. Our research addresses the critical issue of deepfakes through an innovative multimodal approach, targeting both visual and auditory elements. This comprehensive strategy recognizes that human perception integrates multiple sensory inputs, particularly visual and auditory information, to form a complete understanding of media content. For visual analysis, a model that employs advanced feature extraction techniques was developed, extracting nine distinct facial characteristics and then applying various machine learning and deep learning models. For auditory analysis, our model leverages mel-spectrogram analysis for feature extraction and then applies various machine learning and deep learningmodels. To achieve a combined analysis, real and deepfake audio in the original dataset were swapped for testing purposes and ensured balanced samples. Using our proposed models for video and audio classification i.e. Artificial Neural Network and VGG19, the overall sample is classified as deepfake if either component is identified as such. Our multimodal framework combines visual and auditory analyses, yielding an accuracy of 94%.</description><subject>Artificial neural networks</subject><subject>Audio data</subject><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Computer Vision and Pattern Recognition</subject><subject>Computer Science - Learning</subject><subject>Computer Science - Logic in Computer Science</subject><subject>Deception</subject><subject>Deep learning</subject><subject>Deepfake</subject><subject>Feature extraction</subject><subject>Machine learning</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><sourceid>GOX</sourceid><recordid>eNotj81OwzAQhC0kpFalD8CJSJxT1vZu7B6rQgGpiEvvkVPbUtqkDk7Cz9tjWg6rWc2MVvsxdsthgZoIHkz8rj8XApMBErW6YlMhJc81CjFh874_AIAolCCSU5avsrexGeo2WNNkm2ha9xXiMfMhZo_Odd4cXVoGtx_qcLph1940vZv_64ztNk-79Uu-fX9-Xa-2uSFBuSj4kpQmaxBJF4qrSoETwEE7iZVIYz0abZVX5CwnXaXY-iU6NGhRztjd5eyZpexi3Zr4U_4xlWem1Li_NLoYPkbXD-UhjPGUfiol51KjBiD5CwahTKI</recordid><startdate>20241004</startdate><enddate>20241004</enddate><creator>Gandhi, Kashish</creator><creator>Kulkarni, Prutha</creator><creator>Shah, Taran</creator><creator>Chaudhari, Piyush</creator><creator>Narvekar, Meera</creator><creator>Ghag, Kranti</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20241004</creationdate><title>A Multimodal Framework for Deepfake Detection</title><author>Gandhi, Kashish ; Kulkarni, Prutha ; Shah, Taran ; Chaudhari, Piyush ; Narvekar, Meera ; Ghag, Kranti</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a525-26195785da44586717b70e20108e34b234bdf4a8d7f75ed158b0e2df94e4a4d43</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Artificial neural networks</topic><topic>Audio data</topic><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Computer Vision and Pattern Recognition</topic><topic>Computer Science - Learning</topic><topic>Computer Science - Logic in Computer Science</topic><topic>Deception</topic><topic>Deep learning</topic><topic>Deepfake</topic><topic>Feature extraction</topic><topic>Machine learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Gandhi, Kashish</creatorcontrib><creatorcontrib>Kulkarni, Prutha</creatorcontrib><creatorcontrib>Shah, Taran</creatorcontrib><creatorcontrib>Chaudhari, Piyush</creatorcontrib><creatorcontrib>Narvekar, Meera</creatorcontrib><creatorcontrib>Ghag, Kranti</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection><collection>arXiv Computer Science</collection><collection>arXiv.org</collection><jtitle>arXiv.org</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Gandhi, Kashish</au><au>Kulkarni, Prutha</au><au>Shah, Taran</au><au>Chaudhari, Piyush</au><au>Narvekar, Meera</au><au>Ghag, Kranti</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A Multimodal Framework for Deepfake Detection</atitle><jtitle>arXiv.org</jtitle><date>2024-10-04</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>The rapid advancement of deepfake technology poses a significant threat to digital media integrity. Deepfakes, synthetic media created using AI, can convincingly alter videos and audio to misrepresent reality. This creates risks of misinformation, fraud, and severe implications for personal privacy and security. Our research addresses the critical issue of deepfakes through an innovative multimodal approach, targeting both visual and auditory elements. This comprehensive strategy recognizes that human perception integrates multiple sensory inputs, particularly visual and auditory information, to form a complete understanding of media content. For visual analysis, a model that employs advanced feature extraction techniques was developed, extracting nine distinct facial characteristics and then applying various machine learning and deep learning models. For auditory analysis, our model leverages mel-spectrogram analysis for feature extraction and then applies various machine learning and deep learningmodels. To achieve a combined analysis, real and deepfake audio in the original dataset were swapped for testing purposes and ensured balanced samples. Using our proposed models for video and audio classification i.e. Artificial Neural Network and VGG19, the overall sample is classified as deepfake if either component is identified as such. Our multimodal framework combines visual and auditory analyses, yielding an accuracy of 94%.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><doi>10.48550/arxiv.2410.03487</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | EISSN: 2331-8422 |
ispartof | arXiv.org, 2024-10 |
issn | 2331-8422 |
language | eng |
recordid | cdi_arxiv_primary_2410_03487 |
source | arXiv.org; Free E- Journals |
subjects | Artificial neural networks Audio data Computer Science - Artificial Intelligence Computer Science - Computer Vision and Pattern Recognition Computer Science - Learning Computer Science - Logic in Computer Science Deception Deep learning Deepfake Feature extraction Machine learning |
title | A Multimodal Framework for Deepfake Detection |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-28T16%3A57%3A54IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_arxiv&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20Multimodal%20Framework%20for%20Deepfake%20Detection&rft.jtitle=arXiv.org&rft.au=Gandhi,%20Kashish&rft.date=2024-10-04&rft.eissn=2331-8422&rft_id=info:doi/10.48550/arxiv.2410.03487&rft_dat=%3Cproquest_arxiv%3E3113848005%3C/proquest_arxiv%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3113848005&rft_id=info:pmid/&rfr_iscdi=true |