Taming Latent Diffusion Model for Neural Radiance Field Inpainting
Neural Radiance Field (NeRF) is a representation for 3D reconstruction from multi-view images. Despite some recent work showing preliminary success in editing a reconstructed NeRF with diffusion prior, they remain struggling to synthesize reasonable geometry in completely uncovered regions. One majo...
Gespeichert in:
Hauptverfasser: | , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Lin, Chieh Hubert Kim, Changil Huang, Jia-Bin Li, Qinbo Ma, Chih-Yao Kopf, Johannes Yang, Ming-Hsuan Tseng, Hung-Yu |
description | Neural Radiance Field (NeRF) is a representation for 3D reconstruction from
multi-view images. Despite some recent work showing preliminary success in
editing a reconstructed NeRF with diffusion prior, they remain struggling to
synthesize reasonable geometry in completely uncovered regions. One major
reason is the high diversity of synthetic contents from the diffusion model,
which hinders the radiance field from converging to a crisp and deterministic
geometry. Moreover, applying latent diffusion models on real data often yields
a textural shift incoherent to the image condition due to auto-encoding errors.
These two problems are further reinforced with the use of pixel-distance
losses. To address these issues, we propose tempering the diffusion model's
stochasticity with per-scene customization and mitigating the textural shift
with masked adversarial training. During the analyses, we also found the
commonly used pixel and perceptual losses are harmful in the NeRF inpainting
task. Through rigorous experiments, our framework yields state-of-the-art NeRF
inpainting results on various real-world scenes. Project page:
https://hubert0527.github.io/MALD-NeRF |
doi_str_mv | 10.48550/arxiv.2404.09995 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2404_09995</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2404_09995</sourcerecordid><originalsourceid>FETCH-LOGICAL-a675-f872e51c36add112909c1156657d1cb80b5b7bfb2db286f0ee3f40aa72b7a8753</originalsourceid><addsrcrecordid>eNotz71OwzAUhmEvDKhwAUz4BhJsJ8c_IxQKlQJIKHt0HB8jS6lTuSmCuwcK0ze9n_QwdiVF3VoAcYPlM33UqhVtLZxzcM7uetyl_M47XCgv_D7FeDykOfPnOdDE41z4Cx0LTvwNQ8I8Et8kmgLf5j2mvPy0F-ws4nSgy_9dsX7z0K-fqu71cbu-7SrUBqpojSKQY6MxBCmVE26UErQGE-TorfDgjY9eBa-sjoKoia1ANMobtAaaFbv-uz0hhn1JOyxfwy9mOGGab1kYRB8</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Taming Latent Diffusion Model for Neural Radiance Field Inpainting</title><source>arXiv.org</source><creator>Lin, Chieh Hubert ; Kim, Changil ; Huang, Jia-Bin ; Li, Qinbo ; Ma, Chih-Yao ; Kopf, Johannes ; Yang, Ming-Hsuan ; Tseng, Hung-Yu</creator><creatorcontrib>Lin, Chieh Hubert ; Kim, Changil ; Huang, Jia-Bin ; Li, Qinbo ; Ma, Chih-Yao ; Kopf, Johannes ; Yang, Ming-Hsuan ; Tseng, Hung-Yu</creatorcontrib><description>Neural Radiance Field (NeRF) is a representation for 3D reconstruction from
multi-view images. Despite some recent work showing preliminary success in
editing a reconstructed NeRF with diffusion prior, they remain struggling to
synthesize reasonable geometry in completely uncovered regions. One major
reason is the high diversity of synthetic contents from the diffusion model,
which hinders the radiance field from converging to a crisp and deterministic
geometry. Moreover, applying latent diffusion models on real data often yields
a textural shift incoherent to the image condition due to auto-encoding errors.
These two problems are further reinforced with the use of pixel-distance
losses. To address these issues, we propose tempering the diffusion model's
stochasticity with per-scene customization and mitigating the textural shift
with masked adversarial training. During the analyses, we also found the
commonly used pixel and perceptual losses are harmful in the NeRF inpainting
task. Through rigorous experiments, our framework yields state-of-the-art NeRF
inpainting results on various real-world scenes. Project page:
https://hubert0527.github.io/MALD-NeRF</description><identifier>DOI: 10.48550/arxiv.2404.09995</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Computer Vision and Pattern Recognition ; Computer Science - Learning</subject><creationdate>2024-04</creationdate><rights>http://creativecommons.org/licenses/by-nc-nd/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2404.09995$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2404.09995$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Lin, Chieh Hubert</creatorcontrib><creatorcontrib>Kim, Changil</creatorcontrib><creatorcontrib>Huang, Jia-Bin</creatorcontrib><creatorcontrib>Li, Qinbo</creatorcontrib><creatorcontrib>Ma, Chih-Yao</creatorcontrib><creatorcontrib>Kopf, Johannes</creatorcontrib><creatorcontrib>Yang, Ming-Hsuan</creatorcontrib><creatorcontrib>Tseng, Hung-Yu</creatorcontrib><title>Taming Latent Diffusion Model for Neural Radiance Field Inpainting</title><description>Neural Radiance Field (NeRF) is a representation for 3D reconstruction from
multi-view images. Despite some recent work showing preliminary success in
editing a reconstructed NeRF with diffusion prior, they remain struggling to
synthesize reasonable geometry in completely uncovered regions. One major
reason is the high diversity of synthetic contents from the diffusion model,
which hinders the radiance field from converging to a crisp and deterministic
geometry. Moreover, applying latent diffusion models on real data often yields
a textural shift incoherent to the image condition due to auto-encoding errors.
These two problems are further reinforced with the use of pixel-distance
losses. To address these issues, we propose tempering the diffusion model's
stochasticity with per-scene customization and mitigating the textural shift
with masked adversarial training. During the analyses, we also found the
commonly used pixel and perceptual losses are harmful in the NeRF inpainting
task. Through rigorous experiments, our framework yields state-of-the-art NeRF
inpainting results on various real-world scenes. Project page:
https://hubert0527.github.io/MALD-NeRF</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Computer Vision and Pattern Recognition</subject><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotz71OwzAUhmEvDKhwAUz4BhJsJ8c_IxQKlQJIKHt0HB8jS6lTuSmCuwcK0ze9n_QwdiVF3VoAcYPlM33UqhVtLZxzcM7uetyl_M47XCgv_D7FeDykOfPnOdDE41z4Cx0LTvwNQ8I8Et8kmgLf5j2mvPy0F-ws4nSgy_9dsX7z0K-fqu71cbu-7SrUBqpojSKQY6MxBCmVE26UErQGE-TorfDgjY9eBa-sjoKoia1ANMobtAaaFbv-uz0hhn1JOyxfwy9mOGGab1kYRB8</recordid><startdate>20240415</startdate><enddate>20240415</enddate><creator>Lin, Chieh Hubert</creator><creator>Kim, Changil</creator><creator>Huang, Jia-Bin</creator><creator>Li, Qinbo</creator><creator>Ma, Chih-Yao</creator><creator>Kopf, Johannes</creator><creator>Yang, Ming-Hsuan</creator><creator>Tseng, Hung-Yu</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240415</creationdate><title>Taming Latent Diffusion Model for Neural Radiance Field Inpainting</title><author>Lin, Chieh Hubert ; Kim, Changil ; Huang, Jia-Bin ; Li, Qinbo ; Ma, Chih-Yao ; Kopf, Johannes ; Yang, Ming-Hsuan ; Tseng, Hung-Yu</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a675-f872e51c36add112909c1156657d1cb80b5b7bfb2db286f0ee3f40aa72b7a8753</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Computer Vision and Pattern Recognition</topic><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Lin, Chieh Hubert</creatorcontrib><creatorcontrib>Kim, Changil</creatorcontrib><creatorcontrib>Huang, Jia-Bin</creatorcontrib><creatorcontrib>Li, Qinbo</creatorcontrib><creatorcontrib>Ma, Chih-Yao</creatorcontrib><creatorcontrib>Kopf, Johannes</creatorcontrib><creatorcontrib>Yang, Ming-Hsuan</creatorcontrib><creatorcontrib>Tseng, Hung-Yu</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Lin, Chieh Hubert</au><au>Kim, Changil</au><au>Huang, Jia-Bin</au><au>Li, Qinbo</au><au>Ma, Chih-Yao</au><au>Kopf, Johannes</au><au>Yang, Ming-Hsuan</au><au>Tseng, Hung-Yu</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Taming Latent Diffusion Model for Neural Radiance Field Inpainting</atitle><date>2024-04-15</date><risdate>2024</risdate><abstract>Neural Radiance Field (NeRF) is a representation for 3D reconstruction from
multi-view images. Despite some recent work showing preliminary success in
editing a reconstructed NeRF with diffusion prior, they remain struggling to
synthesize reasonable geometry in completely uncovered regions. One major
reason is the high diversity of synthetic contents from the diffusion model,
which hinders the radiance field from converging to a crisp and deterministic
geometry. Moreover, applying latent diffusion models on real data often yields
a textural shift incoherent to the image condition due to auto-encoding errors.
These two problems are further reinforced with the use of pixel-distance
losses. To address these issues, we propose tempering the diffusion model's
stochasticity with per-scene customization and mitigating the textural shift
with masked adversarial training. During the analyses, we also found the
commonly used pixel and perceptual losses are harmful in the NeRF inpainting
task. Through rigorous experiments, our framework yields state-of-the-art NeRF
inpainting results on various real-world scenes. Project page:
https://hubert0527.github.io/MALD-NeRF</abstract><doi>10.48550/arxiv.2404.09995</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2404.09995 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2404_09995 |
source | arXiv.org |
subjects | Computer Science - Artificial Intelligence Computer Science - Computer Vision and Pattern Recognition Computer Science - Learning |
title | Taming Latent Diffusion Model for Neural Radiance Field Inpainting |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-04T23%3A40%3A19IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Taming%20Latent%20Diffusion%20Model%20for%20Neural%20Radiance%20Field%20Inpainting&rft.au=Lin,%20Chieh%20Hubert&rft.date=2024-04-15&rft_id=info:doi/10.48550/arxiv.2404.09995&rft_dat=%3Carxiv_GOX%3E2404_09995%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |