MaterialSeg3D: Segmenting Dense Materials from 2D Priors for 3D Assets

Driven by powerful image diffusion models, recent research has achieved the automatic creation of 3D objects from textual or visual guidance. By performing score distillation sampling (SDS) iteratively across different views, these methods succeed in lifting 2D generative prior to the 3D space. Howe...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Li, Zeyu, Gan, Ruitong, Luo, Chuanchen, Wang, Yuxi, Liu, Jiaheng, Zhang, Ziwei Zhu Man, Li, Qing, Yin, Xucheng, Zhang, Zhaoxiang, Peng, Junran
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Li, Zeyu
Gan, Ruitong
Luo, Chuanchen
Wang, Yuxi
Liu, Jiaheng
Zhang, Ziwei Zhu Man
Li, Qing
Yin, Xucheng
Zhang, Zhaoxiang
Peng, Junran
description Driven by powerful image diffusion models, recent research has achieved the automatic creation of 3D objects from textual or visual guidance. By performing score distillation sampling (SDS) iteratively across different views, these methods succeed in lifting 2D generative prior to the 3D space. However, such a 2D generative image prior bakes the effect of illumination and shadow into the texture. As a result, material maps optimized by SDS inevitably involve spurious correlated components. The absence of precise material definition makes it infeasible to relight the generated assets reasonably in novel scenes, which limits their application in downstream scenarios. In contrast, humans can effortlessly circumvent this ambiguity by deducing the material of the object from its appearance and semantics. Motivated by this insight, we propose MaterialSeg3D, a 3D asset material generation framework to infer underlying material from the 2D semantic prior. Based on such a prior model, we devise a mechanism to parse material in 3D space. We maintain a UV stack, each map of which is unprojected from a specific viewpoint. After traversing all viewpoints, we fuse the stack through a weighted voting scheme and then employ region unification to ensure the coherence of the object parts. To fuel the learning of semantics prior, we collect a material dataset, named Materialized Individual Objects (MIO), which features abundant images, diverse categories, and accurate annotations. Extensive quantitative and qualitative experiments demonstrate the effectiveness of our method.
doi_str_mv 10.48550/arxiv.2404.13923
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2404_13923</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2404_13923</sourcerecordid><originalsourceid>FETCH-LOGICAL-a673-6fc81a5c29471db6369c219562d6d594b2619d91031ce28e257f0bccbc90ba463</originalsourceid><addsrcrecordid>eNo1j8FKxDAURbNxIaMf4GryA63Je0nauBumjgojCjP7kqSvQ8C2khTRv7eOujpc7uXCYexGilLVWotblz7jRwlKqFKiBbxku2c3U4ru7UAnbO74goHGOY4n3tCYif_3mfdpGjg0_DXFKS1xShwbvsmZ5nzFLvplQ9d_XLHj7v64fSz2Lw9P282-cKbCwvShlk4HsKqSnTdobABptYHOdNoqD0bazkqBMhDUBLrqhQ_BByu8UwZXbP17exZp31McXPpqf4TasxB-A9v2RAM</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>MaterialSeg3D: Segmenting Dense Materials from 2D Priors for 3D Assets</title><source>arXiv.org</source><creator>Li, Zeyu ; Gan, Ruitong ; Luo, Chuanchen ; Wang, Yuxi ; Liu, Jiaheng ; Zhang, Ziwei Zhu Man ; Li, Qing ; Yin, Xucheng ; Zhang, Zhaoxiang ; Peng, Junran</creator><creatorcontrib>Li, Zeyu ; Gan, Ruitong ; Luo, Chuanchen ; Wang, Yuxi ; Liu, Jiaheng ; Zhang, Ziwei Zhu Man ; Li, Qing ; Yin, Xucheng ; Zhang, Zhaoxiang ; Peng, Junran</creatorcontrib><description>Driven by powerful image diffusion models, recent research has achieved the automatic creation of 3D objects from textual or visual guidance. By performing score distillation sampling (SDS) iteratively across different views, these methods succeed in lifting 2D generative prior to the 3D space. However, such a 2D generative image prior bakes the effect of illumination and shadow into the texture. As a result, material maps optimized by SDS inevitably involve spurious correlated components. The absence of precise material definition makes it infeasible to relight the generated assets reasonably in novel scenes, which limits their application in downstream scenarios. In contrast, humans can effortlessly circumvent this ambiguity by deducing the material of the object from its appearance and semantics. Motivated by this insight, we propose MaterialSeg3D, a 3D asset material generation framework to infer underlying material from the 2D semantic prior. Based on such a prior model, we devise a mechanism to parse material in 3D space. We maintain a UV stack, each map of which is unprojected from a specific viewpoint. After traversing all viewpoints, we fuse the stack through a weighted voting scheme and then employ region unification to ensure the coherence of the object parts. To fuel the learning of semantics prior, we collect a material dataset, named Materialized Individual Objects (MIO), which features abundant images, diverse categories, and accurate annotations. Extensive quantitative and qualitative experiments demonstrate the effectiveness of our method.</description><identifier>DOI: 10.48550/arxiv.2404.13923</identifier><language>eng</language><subject>Computer Science - Computer Vision and Pattern Recognition</subject><creationdate>2024-04</creationdate><rights>http://creativecommons.org/licenses/by-nc-nd/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,782,887</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2404.13923$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2404.13923$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Li, Zeyu</creatorcontrib><creatorcontrib>Gan, Ruitong</creatorcontrib><creatorcontrib>Luo, Chuanchen</creatorcontrib><creatorcontrib>Wang, Yuxi</creatorcontrib><creatorcontrib>Liu, Jiaheng</creatorcontrib><creatorcontrib>Zhang, Ziwei Zhu Man</creatorcontrib><creatorcontrib>Li, Qing</creatorcontrib><creatorcontrib>Yin, Xucheng</creatorcontrib><creatorcontrib>Zhang, Zhaoxiang</creatorcontrib><creatorcontrib>Peng, Junran</creatorcontrib><title>MaterialSeg3D: Segmenting Dense Materials from 2D Priors for 3D Assets</title><description>Driven by powerful image diffusion models, recent research has achieved the automatic creation of 3D objects from textual or visual guidance. By performing score distillation sampling (SDS) iteratively across different views, these methods succeed in lifting 2D generative prior to the 3D space. However, such a 2D generative image prior bakes the effect of illumination and shadow into the texture. As a result, material maps optimized by SDS inevitably involve spurious correlated components. The absence of precise material definition makes it infeasible to relight the generated assets reasonably in novel scenes, which limits their application in downstream scenarios. In contrast, humans can effortlessly circumvent this ambiguity by deducing the material of the object from its appearance and semantics. Motivated by this insight, we propose MaterialSeg3D, a 3D asset material generation framework to infer underlying material from the 2D semantic prior. Based on such a prior model, we devise a mechanism to parse material in 3D space. We maintain a UV stack, each map of which is unprojected from a specific viewpoint. After traversing all viewpoints, we fuse the stack through a weighted voting scheme and then employ region unification to ensure the coherence of the object parts. To fuel the learning of semantics prior, we collect a material dataset, named Materialized Individual Objects (MIO), which features abundant images, diverse categories, and accurate annotations. Extensive quantitative and qualitative experiments demonstrate the effectiveness of our method.</description><subject>Computer Science - Computer Vision and Pattern Recognition</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNo1j8FKxDAURbNxIaMf4GryA63Je0nauBumjgojCjP7kqSvQ8C2khTRv7eOujpc7uXCYexGilLVWotblz7jRwlKqFKiBbxku2c3U4ru7UAnbO74goHGOY4n3tCYif_3mfdpGjg0_DXFKS1xShwbvsmZ5nzFLvplQ9d_XLHj7v64fSz2Lw9P282-cKbCwvShlk4HsKqSnTdobABptYHOdNoqD0bazkqBMhDUBLrqhQ_BByu8UwZXbP17exZp31McXPpqf4TasxB-A9v2RAM</recordid><startdate>20240422</startdate><enddate>20240422</enddate><creator>Li, Zeyu</creator><creator>Gan, Ruitong</creator><creator>Luo, Chuanchen</creator><creator>Wang, Yuxi</creator><creator>Liu, Jiaheng</creator><creator>Zhang, Ziwei Zhu Man</creator><creator>Li, Qing</creator><creator>Yin, Xucheng</creator><creator>Zhang, Zhaoxiang</creator><creator>Peng, Junran</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240422</creationdate><title>MaterialSeg3D: Segmenting Dense Materials from 2D Priors for 3D Assets</title><author>Li, Zeyu ; Gan, Ruitong ; Luo, Chuanchen ; Wang, Yuxi ; Liu, Jiaheng ; Zhang, Ziwei Zhu Man ; Li, Qing ; Yin, Xucheng ; Zhang, Zhaoxiang ; Peng, Junran</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a673-6fc81a5c29471db6369c219562d6d594b2619d91031ce28e257f0bccbc90ba463</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Computer Vision and Pattern Recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>Li, Zeyu</creatorcontrib><creatorcontrib>Gan, Ruitong</creatorcontrib><creatorcontrib>Luo, Chuanchen</creatorcontrib><creatorcontrib>Wang, Yuxi</creatorcontrib><creatorcontrib>Liu, Jiaheng</creatorcontrib><creatorcontrib>Zhang, Ziwei Zhu Man</creatorcontrib><creatorcontrib>Li, Qing</creatorcontrib><creatorcontrib>Yin, Xucheng</creatorcontrib><creatorcontrib>Zhang, Zhaoxiang</creatorcontrib><creatorcontrib>Peng, Junran</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Li, Zeyu</au><au>Gan, Ruitong</au><au>Luo, Chuanchen</au><au>Wang, Yuxi</au><au>Liu, Jiaheng</au><au>Zhang, Ziwei Zhu Man</au><au>Li, Qing</au><au>Yin, Xucheng</au><au>Zhang, Zhaoxiang</au><au>Peng, Junran</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>MaterialSeg3D: Segmenting Dense Materials from 2D Priors for 3D Assets</atitle><date>2024-04-22</date><risdate>2024</risdate><abstract>Driven by powerful image diffusion models, recent research has achieved the automatic creation of 3D objects from textual or visual guidance. By performing score distillation sampling (SDS) iteratively across different views, these methods succeed in lifting 2D generative prior to the 3D space. However, such a 2D generative image prior bakes the effect of illumination and shadow into the texture. As a result, material maps optimized by SDS inevitably involve spurious correlated components. The absence of precise material definition makes it infeasible to relight the generated assets reasonably in novel scenes, which limits their application in downstream scenarios. In contrast, humans can effortlessly circumvent this ambiguity by deducing the material of the object from its appearance and semantics. Motivated by this insight, we propose MaterialSeg3D, a 3D asset material generation framework to infer underlying material from the 2D semantic prior. Based on such a prior model, we devise a mechanism to parse material in 3D space. We maintain a UV stack, each map of which is unprojected from a specific viewpoint. After traversing all viewpoints, we fuse the stack through a weighted voting scheme and then employ region unification to ensure the coherence of the object parts. To fuel the learning of semantics prior, we collect a material dataset, named Materialized Individual Objects (MIO), which features abundant images, diverse categories, and accurate annotations. Extensive quantitative and qualitative experiments demonstrate the effectiveness of our method.</abstract><doi>10.48550/arxiv.2404.13923</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2404.13923
ispartof
issn
language eng
recordid cdi_arxiv_primary_2404_13923
source arXiv.org
subjects Computer Science - Computer Vision and Pattern Recognition
title MaterialSeg3D: Segmenting Dense Materials from 2D Priors for 3D Assets
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-04T06%3A27%3A01IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=MaterialSeg3D:%20Segmenting%20Dense%20Materials%20from%202D%20Priors%20for%203D%20Assets&rft.au=Li,%20Zeyu&rft.date=2024-04-22&rft_id=info:doi/10.48550/arxiv.2404.13923&rft_dat=%3Carxiv_GOX%3E2404_13923%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true