Taming Latent Diffusion Model for Neural Radiance Field Inpainting

Neural Radiance Field (NeRF) is a representation for 3D reconstruction from multi-view images. Despite some recent work showing preliminary success in editing a reconstructed NeRF with diffusion prior, they remain struggling to synthesize reasonable geometry in completely uncovered regions. One majo...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Lin, Chieh Hubert, Kim, Changil, Huang, Jia-Bin, Li, Qinbo, Ma, Chih-Yao, Kopf, Johannes, Yang, Ming-Hsuan, Tseng, Hung-Yu
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Lin, Chieh Hubert
Kim, Changil
Huang, Jia-Bin
Li, Qinbo
Ma, Chih-Yao
Kopf, Johannes
Yang, Ming-Hsuan
Tseng, Hung-Yu
description Neural Radiance Field (NeRF) is a representation for 3D reconstruction from multi-view images. Despite some recent work showing preliminary success in editing a reconstructed NeRF with diffusion prior, they remain struggling to synthesize reasonable geometry in completely uncovered regions. One major reason is the high diversity of synthetic contents from the diffusion model, which hinders the radiance field from converging to a crisp and deterministic geometry. Moreover, applying latent diffusion models on real data often yields a textural shift incoherent to the image condition due to auto-encoding errors. These two problems are further reinforced with the use of pixel-distance losses. To address these issues, we propose tempering the diffusion model's stochasticity with per-scene customization and mitigating the textural shift with masked adversarial training. During the analyses, we also found the commonly used pixel and perceptual losses are harmful in the NeRF inpainting task. Through rigorous experiments, our framework yields state-of-the-art NeRF inpainting results on various real-world scenes. Project page: https://hubert0527.github.io/MALD-NeRF
doi_str_mv 10.48550/arxiv.2404.09995
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2404_09995</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2404_09995</sourcerecordid><originalsourceid>FETCH-LOGICAL-a675-f872e51c36add112909c1156657d1cb80b5b7bfb2db286f0ee3f40aa72b7a8753</originalsourceid><addsrcrecordid>eNotz71OwzAUhmEvDKhwAUz4BhJsJ8c_IxQKlQJIKHt0HB8jS6lTuSmCuwcK0ze9n_QwdiVF3VoAcYPlM33UqhVtLZxzcM7uetyl_M47XCgv_D7FeDykOfPnOdDE41z4Cx0LTvwNQ8I8Et8kmgLf5j2mvPy0F-ws4nSgy_9dsX7z0K-fqu71cbu-7SrUBqpojSKQY6MxBCmVE26UErQGE-TorfDgjY9eBa-sjoKoia1ANMobtAaaFbv-uz0hhn1JOyxfwy9mOGGab1kYRB8</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Taming Latent Diffusion Model for Neural Radiance Field Inpainting</title><source>arXiv.org</source><creator>Lin, Chieh Hubert ; Kim, Changil ; Huang, Jia-Bin ; Li, Qinbo ; Ma, Chih-Yao ; Kopf, Johannes ; Yang, Ming-Hsuan ; Tseng, Hung-Yu</creator><creatorcontrib>Lin, Chieh Hubert ; Kim, Changil ; Huang, Jia-Bin ; Li, Qinbo ; Ma, Chih-Yao ; Kopf, Johannes ; Yang, Ming-Hsuan ; Tseng, Hung-Yu</creatorcontrib><description>Neural Radiance Field (NeRF) is a representation for 3D reconstruction from multi-view images. Despite some recent work showing preliminary success in editing a reconstructed NeRF with diffusion prior, they remain struggling to synthesize reasonable geometry in completely uncovered regions. One major reason is the high diversity of synthetic contents from the diffusion model, which hinders the radiance field from converging to a crisp and deterministic geometry. Moreover, applying latent diffusion models on real data often yields a textural shift incoherent to the image condition due to auto-encoding errors. These two problems are further reinforced with the use of pixel-distance losses. To address these issues, we propose tempering the diffusion model's stochasticity with per-scene customization and mitigating the textural shift with masked adversarial training. During the analyses, we also found the commonly used pixel and perceptual losses are harmful in the NeRF inpainting task. Through rigorous experiments, our framework yields state-of-the-art NeRF inpainting results on various real-world scenes. Project page: https://hubert0527.github.io/MALD-NeRF</description><identifier>DOI: 10.48550/arxiv.2404.09995</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Computer Vision and Pattern Recognition ; Computer Science - Learning</subject><creationdate>2024-04</creationdate><rights>http://creativecommons.org/licenses/by-nc-nd/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2404.09995$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2404.09995$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Lin, Chieh Hubert</creatorcontrib><creatorcontrib>Kim, Changil</creatorcontrib><creatorcontrib>Huang, Jia-Bin</creatorcontrib><creatorcontrib>Li, Qinbo</creatorcontrib><creatorcontrib>Ma, Chih-Yao</creatorcontrib><creatorcontrib>Kopf, Johannes</creatorcontrib><creatorcontrib>Yang, Ming-Hsuan</creatorcontrib><creatorcontrib>Tseng, Hung-Yu</creatorcontrib><title>Taming Latent Diffusion Model for Neural Radiance Field Inpainting</title><description>Neural Radiance Field (NeRF) is a representation for 3D reconstruction from multi-view images. Despite some recent work showing preliminary success in editing a reconstructed NeRF with diffusion prior, they remain struggling to synthesize reasonable geometry in completely uncovered regions. One major reason is the high diversity of synthetic contents from the diffusion model, which hinders the radiance field from converging to a crisp and deterministic geometry. Moreover, applying latent diffusion models on real data often yields a textural shift incoherent to the image condition due to auto-encoding errors. These two problems are further reinforced with the use of pixel-distance losses. To address these issues, we propose tempering the diffusion model's stochasticity with per-scene customization and mitigating the textural shift with masked adversarial training. During the analyses, we also found the commonly used pixel and perceptual losses are harmful in the NeRF inpainting task. Through rigorous experiments, our framework yields state-of-the-art NeRF inpainting results on various real-world scenes. Project page: https://hubert0527.github.io/MALD-NeRF</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Computer Vision and Pattern Recognition</subject><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotz71OwzAUhmEvDKhwAUz4BhJsJ8c_IxQKlQJIKHt0HB8jS6lTuSmCuwcK0ze9n_QwdiVF3VoAcYPlM33UqhVtLZxzcM7uetyl_M47XCgv_D7FeDykOfPnOdDE41z4Cx0LTvwNQ8I8Et8kmgLf5j2mvPy0F-ws4nSgy_9dsX7z0K-fqu71cbu-7SrUBqpojSKQY6MxBCmVE26UErQGE-TorfDgjY9eBa-sjoKoia1ANMobtAaaFbv-uz0hhn1JOyxfwy9mOGGab1kYRB8</recordid><startdate>20240415</startdate><enddate>20240415</enddate><creator>Lin, Chieh Hubert</creator><creator>Kim, Changil</creator><creator>Huang, Jia-Bin</creator><creator>Li, Qinbo</creator><creator>Ma, Chih-Yao</creator><creator>Kopf, Johannes</creator><creator>Yang, Ming-Hsuan</creator><creator>Tseng, Hung-Yu</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240415</creationdate><title>Taming Latent Diffusion Model for Neural Radiance Field Inpainting</title><author>Lin, Chieh Hubert ; Kim, Changil ; Huang, Jia-Bin ; Li, Qinbo ; Ma, Chih-Yao ; Kopf, Johannes ; Yang, Ming-Hsuan ; Tseng, Hung-Yu</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a675-f872e51c36add112909c1156657d1cb80b5b7bfb2db286f0ee3f40aa72b7a8753</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Computer Vision and Pattern Recognition</topic><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Lin, Chieh Hubert</creatorcontrib><creatorcontrib>Kim, Changil</creatorcontrib><creatorcontrib>Huang, Jia-Bin</creatorcontrib><creatorcontrib>Li, Qinbo</creatorcontrib><creatorcontrib>Ma, Chih-Yao</creatorcontrib><creatorcontrib>Kopf, Johannes</creatorcontrib><creatorcontrib>Yang, Ming-Hsuan</creatorcontrib><creatorcontrib>Tseng, Hung-Yu</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Lin, Chieh Hubert</au><au>Kim, Changil</au><au>Huang, Jia-Bin</au><au>Li, Qinbo</au><au>Ma, Chih-Yao</au><au>Kopf, Johannes</au><au>Yang, Ming-Hsuan</au><au>Tseng, Hung-Yu</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Taming Latent Diffusion Model for Neural Radiance Field Inpainting</atitle><date>2024-04-15</date><risdate>2024</risdate><abstract>Neural Radiance Field (NeRF) is a representation for 3D reconstruction from multi-view images. Despite some recent work showing preliminary success in editing a reconstructed NeRF with diffusion prior, they remain struggling to synthesize reasonable geometry in completely uncovered regions. One major reason is the high diversity of synthetic contents from the diffusion model, which hinders the radiance field from converging to a crisp and deterministic geometry. Moreover, applying latent diffusion models on real data often yields a textural shift incoherent to the image condition due to auto-encoding errors. These two problems are further reinforced with the use of pixel-distance losses. To address these issues, we propose tempering the diffusion model's stochasticity with per-scene customization and mitigating the textural shift with masked adversarial training. During the analyses, we also found the commonly used pixel and perceptual losses are harmful in the NeRF inpainting task. Through rigorous experiments, our framework yields state-of-the-art NeRF inpainting results on various real-world scenes. Project page: https://hubert0527.github.io/MALD-NeRF</abstract><doi>10.48550/arxiv.2404.09995</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2404.09995
ispartof
issn
language eng
recordid cdi_arxiv_primary_2404_09995
source arXiv.org
subjects Computer Science - Artificial Intelligence
Computer Science - Computer Vision and Pattern Recognition
Computer Science - Learning
title Taming Latent Diffusion Model for Neural Radiance Field Inpainting
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-04T23%3A40%3A19IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Taming%20Latent%20Diffusion%20Model%20for%20Neural%20Radiance%20Field%20Inpainting&rft.au=Lin,%20Chieh%20Hubert&rft.date=2024-04-15&rft_id=info:doi/10.48550/arxiv.2404.09995&rft_dat=%3Carxiv_GOX%3E2404_09995%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true