Taming Latent Diffusion Model for Neural Radiance Field Inpainting

Neural Radiance Field (NeRF) is a representation for 3D reconstruction from multi-view images. Despite some recent work showing preliminary success in editing a reconstructed NeRF with diffusion prior, they remain struggling to synthesize reasonable geometry in completely uncovered regions. One majo...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Lin, Chieh Hubert, Kim, Changil, Huang, Jia-Bin, Li, Qinbo, Ma, Chih-Yao, Kopf, Johannes, Yang, Ming-Hsuan, Tseng, Hung-Yu
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Artificial Intelligence Computer Science - Computer Vision and Pattern Recognition Computer Science - Learning
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Lin, Chieh Hubert Kim, Changil Huang, Jia-Bin Li, Qinbo Ma, Chih-Yao Kopf, Johannes Yang, Ming-Hsuan Tseng, Hung-Yu
description	Neural Radiance Field (NeRF) is a representation for 3D reconstruction from multi-view images. Despite some recent work showing preliminary success in editing a reconstructed NeRF with diffusion prior, they remain struggling to synthesize reasonable geometry in completely uncovered regions. One major reason is the high diversity of synthetic contents from the diffusion model, which hinders the radiance field from converging to a crisp and deterministic geometry. Moreover, applying latent diffusion models on real data often yields a textural shift incoherent to the image condition due to auto-encoding errors. These two problems are further reinforced with the use of pixel-distance losses. To address these issues, we propose tempering the diffusion model's stochasticity with per-scene customization and mitigating the textural shift with masked adversarial training. During the analyses, we also found the commonly used pixel and perceptual losses are harmful in the NeRF inpainting task. Through rigorous experiments, our framework yields state-of-the-art NeRF inpainting results on various real-world scenes. Project page: https://hubert0527.github.io/MALD-NeRF
doi_str_mv	10.48550/arxiv.2404.09995
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2404_09995</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2404_09995</sourcerecordid><originalsourceid>FETCH-LOGICAL-a675-f872e51c36add112909c1156657d1cb80b5b7bfb2db286f0ee3f40aa72b7a8753</originalsourceid><addsrcrecordid>eNotz71OwzAUhmEvDKhwAUz4BhJsJ8c_IxQKlQJIKHt0HB8jS6lTuSmCuwcK0ze9n_QwdiVF3VoAcYPlM33UqhVtLZxzcM7uetyl_M47XCgv_D7FeDykOfPnOdDE41z4Cx0LTvwNQ8I8Et8kmgLf5j2mvPy0F-ws4nSgy_9dsX7z0K-fqu71cbu-7SrUBqpojSKQY6MxBCmVE26UErQGE-TorfDgjY9eBa-sjoKoia1ANMobtAaaFbv-uz0hhn1JOyxfwy9mOGGab1kYRB8</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Taming Latent Diffusion Model for Neural Radiance Field Inpainting</title><source>arXiv.org</source><creator>Lin, Chieh Hubert ; Kim, Changil ; Huang, Jia-Bin ; Li, Qinbo ; Ma, Chih-Yao ; Kopf, Johannes ; Yang, Ming-Hsuan ; Tseng, Hung-Yu</creator><creatorcontrib>Lin, Chieh Hubert ; Kim, Changil ; Huang, Jia-Bin ; Li, Qinbo ; Ma, Chih-Yao ; Kopf, Johannes ; Yang, Ming-Hsuan ; Tseng, Hung-Yu</creatorcontrib><description>Neural Radiance Field (NeRF) is a representation for 3D reconstruction from multi-view images. Despite some recent work showing preliminary success in editing a reconstructed NeRF with diffusion prior, they remain struggling to synthesize reasonable geometry in completely uncovered regions. One major reason is the high diversity of synthetic contents from the diffusion model, which hinders the radiance field from converging to a crisp and deterministic geometry. Moreover, applying latent diffusion models on real data often yields a textural shift incoherent to the image condition due to auto-encoding errors. These two problems are further reinforced with the use of pixel-distance losses. To address these issues, we propose tempering the diffusion model's stochasticity with per-scene customization and mitigating the textural shift with masked adversarial training. During the analyses, we also found the commonly used pixel and perceptual losses are harmful in the NeRF inpainting task. Through rigorous experiments, our framework yields state-of-the-art NeRF inpainting results on various real-world scenes. Project page: https://hubert0527.github.io/MALD-NeRF</description><identifier>DOI: 10.48550/arxiv.2404.09995</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Computer Vision and Pattern Recognition ; Computer Science - Learning</subject><creationdate>2024-04</creationdate><rights>http://creativecommons.org/licenses/by-nc-nd/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2404.09995$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2404.09995$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Lin, Chieh Hubert</creatorcontrib><creatorcontrib>Kim, Changil</creatorcontrib><creatorcontrib>Huang, Jia-Bin</creatorcontrib><creatorcontrib>Li, Qinbo</creatorcontrib><creatorcontrib>Ma, Chih-Yao</creatorcontrib><creatorcontrib>Kopf, Johannes</creatorcontrib><creatorcontrib>Yang, Ming-Hsuan</creatorcontrib><creatorcontrib>Tseng, Hung-Yu</creatorcontrib><title>Taming Latent Diffusion Model for Neural Radiance Field Inpainting</title><description>Neural Radiance Field (NeRF) is a representation for 3D reconstruction from multi-view images. Despite some recent work showing preliminary success in editing a reconstructed NeRF with diffusion prior, they remain struggling to synthesize reasonable geometry in completely uncovered regions. One major reason is the high diversity of synthetic contents from the diffusion model, which hinders the radiance field from converging to a crisp and deterministic geometry. Moreover, applying latent diffusion models on real data often yields a textural shift incoherent to the image condition due to auto-encoding errors. These two problems are further reinforced with the use of pixel-distance losses. To address these issues, we propose tempering the diffusion model's stochasticity with per-scene customization and mitigating the textural shift with masked adversarial training. During the analyses, we also found the commonly used pixel and perceptual losses are harmful in the NeRF inpainting task. Through rigorous experiments, our framework yields state-of-the-art NeRF inpainting results on various real-world scenes. Project page: https://hubert0527.github.io/MALD-NeRF</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Computer Vision and Pattern Recognition</subject><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotz71OwzAUhmEvDKhwAUz4BhJsJ8c_IxQKlQJIKHt0HB8jS6lTuSmCuwcK0ze9n_QwdiVF3VoAcYPlM33UqhVtLZxzcM7uetyl_M47XCgv_D7FeDykOfPnOdDE41z4Cx0LTvwNQ8I8Et8kmgLf5j2mvPy0F-ws4nSgy_9dsX7z0K-fqu71cbu-7SrUBqpojSKQY6MxBCmVE26UErQGE-TorfDgjY9eBa-sjoKoia1ANMobtAaaFbv-uz0hhn1JOyxfwy9mOGGab1kYRB8</recordid><startdate>20240415</startdate><enddate>20240415</enddate><creator>Lin, Chieh Hubert</creator><creator>Kim, Changil</creator><creator>Huang, Jia-Bin</creator><creator>Li, Qinbo</creator><creator>Ma, Chih-Yao</creator><creator>Kopf, Johannes</creator><creator>Yang, Ming-Hsuan</creator><creator>Tseng, Hung-Yu</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240415</creationdate><title>Taming Latent Diffusion Model for Neural Radiance Field Inpainting</title><author>Lin, Chieh Hubert ; Kim, Changil ; Huang, Jia-Bin ; Li, Qinbo ; Ma, Chih-Yao ; Kopf, Johannes ; Yang, Ming-Hsuan ; Tseng, Hung-Yu</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a675-f872e51c36add112909c1156657d1cb80b5b7bfb2db286f0ee3f40aa72b7a8753</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Computer Vision and Pattern Recognition</topic><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Lin, Chieh Hubert</creatorcontrib><creatorcontrib>Kim, Changil</creatorcontrib><creatorcontrib>Huang, Jia-Bin</creatorcontrib><creatorcontrib>Li, Qinbo</creatorcontrib><creatorcontrib>Ma, Chih-Yao</creatorcontrib><creatorcontrib>Kopf, Johannes</creatorcontrib><creatorcontrib>Yang, Ming-Hsuan</creatorcontrib><creatorcontrib>Tseng, Hung-Yu</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Lin, Chieh Hubert</au><au>Kim, Changil</au><au>Huang, Jia-Bin</au><au>Li, Qinbo</au><au>Ma, Chih-Yao</au><au>Kopf, Johannes</au><au>Yang, Ming-Hsuan</au><au>Tseng, Hung-Yu</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Taming Latent Diffusion Model for Neural Radiance Field Inpainting</atitle><date>2024-04-15</date><risdate>2024</risdate><abstract>Neural Radiance Field (NeRF) is a representation for 3D reconstruction from multi-view images. Despite some recent work showing preliminary success in editing a reconstructed NeRF with diffusion prior, they remain struggling to synthesize reasonable geometry in completely uncovered regions. One major reason is the high diversity of synthetic contents from the diffusion model, which hinders the radiance field from converging to a crisp and deterministic geometry. Moreover, applying latent diffusion models on real data often yields a textural shift incoherent to the image condition due to auto-encoding errors. These two problems are further reinforced with the use of pixel-distance losses. To address these issues, we propose tempering the diffusion model's stochasticity with per-scene customization and mitigating the textural shift with masked adversarial training. During the analyses, we also found the commonly used pixel and perceptual losses are harmful in the NeRF inpainting task. Through rigorous experiments, our framework yields state-of-the-art NeRF inpainting results on various real-world scenes. Project page: https://hubert0527.github.io/MALD-NeRF</abstract><doi>10.48550/arxiv.2404.09995</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2404.09995
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2404_09995
source	arXiv.org
subjects	Computer Science - Artificial Intelligence Computer Science - Computer Vision and Pattern Recognition Computer Science - Learning
title	Taming Latent Diffusion Model for Neural Radiance Field Inpainting
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-04T23%3A40%3A19IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Taming%20Latent%20Diffusion%20Model%20for%20Neural%20Radiance%20Field%20Inpainting&rft.au=Lin,%20Chieh%20Hubert&rft.date=2024-04-15&rft_id=info:doi/10.48550/arxiv.2404.09995&rft_dat=%3Carxiv_GOX%3E2404_09995%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true