AEROBLADE: Training-Free Detection of Latent Diffusion Images Using Autoencoder Reconstruction Error

With recent text-to-image models, anyone can generate deceptively realistic images with arbitrary contents, fueling the growing threat of visual disinformation. A key enabler for generating high-resolution images with low computational cost has been the development of latent diffusion models (LDMs)....

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Ricker, Jonas, Lukovnikov, Denis, Fischer, Asja
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Computer Vision and Pattern Recognition
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Ricker, Jonas Lukovnikov, Denis Fischer, Asja
description	With recent text-to-image models, anyone can generate deceptively realistic images with arbitrary contents, fueling the growing threat of visual disinformation. A key enabler for generating high-resolution images with low computational cost has been the development of latent diffusion models (LDMs). In contrast to conventional diffusion models, LDMs perform the denoising process in the low-dimensional latent space of a pre-trained autoencoder (AE) instead of the high-dimensional image space. Despite their relevance, the forensic analysis of LDMs is still in its infancy. In this work we propose AEROBLADE, a novel detection method which exploits an inherent component of LDMs: the AE used to transform images between image and latent space. We find that generated images can be more accurately reconstructed by the AE than real images, allowing for a simple detection approach based on the reconstruction error. Most importantly, our method is easy to implement and does not require any training, yet nearly matches the performance of detectors that rely on extensive training. We empirically demonstrate that AEROBLADE is effective against state-of-the-art LDMs, including Stable Diffusion and Midjourney. Beyond detection, our approach allows for the qualitative analysis of images, which can be leveraged for identifying inpainted regions. We release our code and data at https://github.com/jonasricker/aeroblade .
doi_str_mv	10.48550/arxiv.2401.17879
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2401_17879</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2401_17879</sourcerecordid><originalsourceid>FETCH-LOGICAL-a679-b176810d1c2c4a76e8f034ea2bf86fc09a71d320df5b06af9c0dfa1ffe2249d33</originalsourceid><addsrcrecordid>eNotj01LxDAYhHPxIKs_wJP5A61J-pHWW912daGwsNRzeZu8WQJuImkq-u_dr9MMw8zAQ8gTZ2leFQV7gfBrf1KRM55yWcn6nuim2-_e-qbtXukQwDrrDskmINIWI6povaPe0B4iukhba8wyn7PtEQ4408_51KfNEj065TUGukfl3RzDct12IfjwQO4MfM34eNMVGTbdsP5I-t37dt30CZSyTiYuy4ozzZVQOcgSK8OyHEFMpiqNYjVIrjPBtCkmVoKp1ckCNwaFyGudZSvyfL29YI7fwR4h_I1n3PGCm_0DmHNQ1w</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>AEROBLADE: Training-Free Detection of Latent Diffusion Images Using Autoencoder Reconstruction Error</title><source>arXiv.org</source><creator>Ricker, Jonas ; Lukovnikov, Denis ; Fischer, Asja</creator><creatorcontrib>Ricker, Jonas ; Lukovnikov, Denis ; Fischer, Asja</creatorcontrib><description>With recent text-to-image models, anyone can generate deceptively realistic images with arbitrary contents, fueling the growing threat of visual disinformation. A key enabler for generating high-resolution images with low computational cost has been the development of latent diffusion models (LDMs). In contrast to conventional diffusion models, LDMs perform the denoising process in the low-dimensional latent space of a pre-trained autoencoder (AE) instead of the high-dimensional image space. Despite their relevance, the forensic analysis of LDMs is still in its infancy. In this work we propose AEROBLADE, a novel detection method which exploits an inherent component of LDMs: the AE used to transform images between image and latent space. We find that generated images can be more accurately reconstructed by the AE than real images, allowing for a simple detection approach based on the reconstruction error. Most importantly, our method is easy to implement and does not require any training, yet nearly matches the performance of detectors that rely on extensive training. We empirically demonstrate that AEROBLADE is effective against state-of-the-art LDMs, including Stable Diffusion and Midjourney. Beyond detection, our approach allows for the qualitative analysis of images, which can be leveraged for identifying inpainted regions. We release our code and data at https://github.com/jonasricker/aeroblade .</description><identifier>DOI: 10.48550/arxiv.2401.17879</identifier><language>eng</language><subject>Computer Science - Computer Vision and Pattern Recognition</subject><creationdate>2024-01</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,777,882</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2401.17879$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2401.17879$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Ricker, Jonas</creatorcontrib><creatorcontrib>Lukovnikov, Denis</creatorcontrib><creatorcontrib>Fischer, Asja</creatorcontrib><title>AEROBLADE: Training-Free Detection of Latent Diffusion Images Using Autoencoder Reconstruction Error</title><description>With recent text-to-image models, anyone can generate deceptively realistic images with arbitrary contents, fueling the growing threat of visual disinformation. A key enabler for generating high-resolution images with low computational cost has been the development of latent diffusion models (LDMs). In contrast to conventional diffusion models, LDMs perform the denoising process in the low-dimensional latent space of a pre-trained autoencoder (AE) instead of the high-dimensional image space. Despite their relevance, the forensic analysis of LDMs is still in its infancy. In this work we propose AEROBLADE, a novel detection method which exploits an inherent component of LDMs: the AE used to transform images between image and latent space. We find that generated images can be more accurately reconstructed by the AE than real images, allowing for a simple detection approach based on the reconstruction error. Most importantly, our method is easy to implement and does not require any training, yet nearly matches the performance of detectors that rely on extensive training. We empirically demonstrate that AEROBLADE is effective against state-of-the-art LDMs, including Stable Diffusion and Midjourney. Beyond detection, our approach allows for the qualitative analysis of images, which can be leveraged for identifying inpainted regions. We release our code and data at https://github.com/jonasricker/aeroblade .</description><subject>Computer Science - Computer Vision and Pattern Recognition</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj01LxDAYhHPxIKs_wJP5A61J-pHWW912daGwsNRzeZu8WQJuImkq-u_dr9MMw8zAQ8gTZ2leFQV7gfBrf1KRM55yWcn6nuim2-_e-qbtXukQwDrrDskmINIWI6povaPe0B4iukhba8wyn7PtEQ4408_51KfNEj065TUGukfl3RzDct12IfjwQO4MfM34eNMVGTbdsP5I-t37dt30CZSyTiYuy4ozzZVQOcgSK8OyHEFMpiqNYjVIrjPBtCkmVoKp1ckCNwaFyGudZSvyfL29YI7fwR4h_I1n3PGCm_0DmHNQ1w</recordid><startdate>20240131</startdate><enddate>20240131</enddate><creator>Ricker, Jonas</creator><creator>Lukovnikov, Denis</creator><creator>Fischer, Asja</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240131</creationdate><title>AEROBLADE: Training-Free Detection of Latent Diffusion Images Using Autoencoder Reconstruction Error</title><author>Ricker, Jonas ; Lukovnikov, Denis ; Fischer, Asja</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a679-b176810d1c2c4a76e8f034ea2bf86fc09a71d320df5b06af9c0dfa1ffe2249d33</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Computer Vision and Pattern Recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>Ricker, Jonas</creatorcontrib><creatorcontrib>Lukovnikov, Denis</creatorcontrib><creatorcontrib>Fischer, Asja</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Ricker, Jonas</au><au>Lukovnikov, Denis</au><au>Fischer, Asja</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>AEROBLADE: Training-Free Detection of Latent Diffusion Images Using Autoencoder Reconstruction Error</atitle><date>2024-01-31</date><risdate>2024</risdate><abstract>With recent text-to-image models, anyone can generate deceptively realistic images with arbitrary contents, fueling the growing threat of visual disinformation. A key enabler for generating high-resolution images with low computational cost has been the development of latent diffusion models (LDMs). In contrast to conventional diffusion models, LDMs perform the denoising process in the low-dimensional latent space of a pre-trained autoencoder (AE) instead of the high-dimensional image space. Despite their relevance, the forensic analysis of LDMs is still in its infancy. In this work we propose AEROBLADE, a novel detection method which exploits an inherent component of LDMs: the AE used to transform images between image and latent space. We find that generated images can be more accurately reconstructed by the AE than real images, allowing for a simple detection approach based on the reconstruction error. Most importantly, our method is easy to implement and does not require any training, yet nearly matches the performance of detectors that rely on extensive training. We empirically demonstrate that AEROBLADE is effective against state-of-the-art LDMs, including Stable Diffusion and Midjourney. Beyond detection, our approach allows for the qualitative analysis of images, which can be leveraged for identifying inpainted regions. We release our code and data at https://github.com/jonasricker/aeroblade .</abstract><doi>10.48550/arxiv.2401.17879</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2401.17879
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2401_17879
source	arXiv.org
subjects	Computer Science - Computer Vision and Pattern Recognition
title	AEROBLADE: Training-Free Detection of Latent Diffusion Images Using Autoencoder Reconstruction Error
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-18T00%3A17%3A57IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=AEROBLADE:%20Training-Free%20Detection%20of%20Latent%20Diffusion%20Images%20Using%20Autoencoder%20Reconstruction%20Error&rft.au=Ricker,%20Jonas&rft.date=2024-01-31&rft_id=info:doi/10.48550/arxiv.2401.17879&rft_dat=%3Carxiv_GOX%3E2401_17879%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true