MaterialSeg3D: Segmenting Dense Materials from 2D Priors for 3D Assets

Driven by powerful image diffusion models, recent research has achieved the automatic creation of 3D objects from textual or visual guidance. By performing score distillation sampling (SDS) iteratively across different views, these methods succeed in lifting 2D generative prior to the 3D space. Howe...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Li, Zeyu, Gan, Ruitong, Luo, Chuanchen, Wang, Yuxi, Liu, Jiaheng, Zhang, Ziwei Zhu Man, Li, Qing, Yin, Xucheng, Zhang, Zhaoxiang, Peng, Junran
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Computer Vision and Pattern Recognition
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Li, Zeyu Gan, Ruitong Luo, Chuanchen Wang, Yuxi Liu, Jiaheng Zhang, Ziwei Zhu Man Li, Qing Yin, Xucheng Zhang, Zhaoxiang Peng, Junran
description	Driven by powerful image diffusion models, recent research has achieved the automatic creation of 3D objects from textual or visual guidance. By performing score distillation sampling (SDS) iteratively across different views, these methods succeed in lifting 2D generative prior to the 3D space. However, such a 2D generative image prior bakes the effect of illumination and shadow into the texture. As a result, material maps optimized by SDS inevitably involve spurious correlated components. The absence of precise material definition makes it infeasible to relight the generated assets reasonably in novel scenes, which limits their application in downstream scenarios. In contrast, humans can effortlessly circumvent this ambiguity by deducing the material of the object from its appearance and semantics. Motivated by this insight, we propose MaterialSeg3D, a 3D asset material generation framework to infer underlying material from the 2D semantic prior. Based on such a prior model, we devise a mechanism to parse material in 3D space. We maintain a UV stack, each map of which is unprojected from a specific viewpoint. After traversing all viewpoints, we fuse the stack through a weighted voting scheme and then employ region unification to ensure the coherence of the object parts. To fuel the learning of semantics prior, we collect a material dataset, named Materialized Individual Objects (MIO), which features abundant images, diverse categories, and accurate annotations. Extensive quantitative and qualitative experiments demonstrate the effectiveness of our method.
doi_str_mv	10.48550/arxiv.2404.13923
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2404_13923</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2404_13923</sourcerecordid><originalsourceid>FETCH-LOGICAL-a673-6fc81a5c29471db6369c219562d6d594b2619d91031ce28e257f0bccbc90ba463</originalsourceid><addsrcrecordid>eNo1j8FKxDAURbNxIaMf4GryA63Je0nauBumjgojCjP7kqSvQ8C2khTRv7eOujpc7uXCYexGilLVWotblz7jRwlKqFKiBbxku2c3U4ru7UAnbO74goHGOY4n3tCYif_3mfdpGjg0_DXFKS1xShwbvsmZ5nzFLvplQ9d_XLHj7v64fSz2Lw9P282-cKbCwvShlk4HsKqSnTdobABptYHOdNoqD0bazkqBMhDUBLrqhQ_BByu8UwZXbP17exZp31McXPpqf4TasxB-A9v2RAM</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>MaterialSeg3D: Segmenting Dense Materials from 2D Priors for 3D Assets</title><source>arXiv.org</source><creator>Li, Zeyu ; Gan, Ruitong ; Luo, Chuanchen ; Wang, Yuxi ; Liu, Jiaheng ; Zhang, Ziwei Zhu Man ; Li, Qing ; Yin, Xucheng ; Zhang, Zhaoxiang ; Peng, Junran</creator><creatorcontrib>Li, Zeyu ; Gan, Ruitong ; Luo, Chuanchen ; Wang, Yuxi ; Liu, Jiaheng ; Zhang, Ziwei Zhu Man ; Li, Qing ; Yin, Xucheng ; Zhang, Zhaoxiang ; Peng, Junran</creatorcontrib><description>Driven by powerful image diffusion models, recent research has achieved the automatic creation of 3D objects from textual or visual guidance. By performing score distillation sampling (SDS) iteratively across different views, these methods succeed in lifting 2D generative prior to the 3D space. However, such a 2D generative image prior bakes the effect of illumination and shadow into the texture. As a result, material maps optimized by SDS inevitably involve spurious correlated components. The absence of precise material definition makes it infeasible to relight the generated assets reasonably in novel scenes, which limits their application in downstream scenarios. In contrast, humans can effortlessly circumvent this ambiguity by deducing the material of the object from its appearance and semantics. Motivated by this insight, we propose MaterialSeg3D, a 3D asset material generation framework to infer underlying material from the 2D semantic prior. Based on such a prior model, we devise a mechanism to parse material in 3D space. We maintain a UV stack, each map of which is unprojected from a specific viewpoint. After traversing all viewpoints, we fuse the stack through a weighted voting scheme and then employ region unification to ensure the coherence of the object parts. To fuel the learning of semantics prior, we collect a material dataset, named Materialized Individual Objects (MIO), which features abundant images, diverse categories, and accurate annotations. Extensive quantitative and qualitative experiments demonstrate the effectiveness of our method.</description><identifier>DOI: 10.48550/arxiv.2404.13923</identifier><language>eng</language><subject>Computer Science - Computer Vision and Pattern Recognition</subject><creationdate>2024-04</creationdate><rights>http://creativecommons.org/licenses/by-nc-nd/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,782,887</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2404.13923$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2404.13923$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Li, Zeyu</creatorcontrib><creatorcontrib>Gan, Ruitong</creatorcontrib><creatorcontrib>Luo, Chuanchen</creatorcontrib><creatorcontrib>Wang, Yuxi</creatorcontrib><creatorcontrib>Liu, Jiaheng</creatorcontrib><creatorcontrib>Zhang, Ziwei Zhu Man</creatorcontrib><creatorcontrib>Li, Qing</creatorcontrib><creatorcontrib>Yin, Xucheng</creatorcontrib><creatorcontrib>Zhang, Zhaoxiang</creatorcontrib><creatorcontrib>Peng, Junran</creatorcontrib><title>MaterialSeg3D: Segmenting Dense Materials from 2D Priors for 3D Assets</title><description>Driven by powerful image diffusion models, recent research has achieved the automatic creation of 3D objects from textual or visual guidance. By performing score distillation sampling (SDS) iteratively across different views, these methods succeed in lifting 2D generative prior to the 3D space. However, such a 2D generative image prior bakes the effect of illumination and shadow into the texture. As a result, material maps optimized by SDS inevitably involve spurious correlated components. The absence of precise material definition makes it infeasible to relight the generated assets reasonably in novel scenes, which limits their application in downstream scenarios. In contrast, humans can effortlessly circumvent this ambiguity by deducing the material of the object from its appearance and semantics. Motivated by this insight, we propose MaterialSeg3D, a 3D asset material generation framework to infer underlying material from the 2D semantic prior. Based on such a prior model, we devise a mechanism to parse material in 3D space. We maintain a UV stack, each map of which is unprojected from a specific viewpoint. After traversing all viewpoints, we fuse the stack through a weighted voting scheme and then employ region unification to ensure the coherence of the object parts. To fuel the learning of semantics prior, we collect a material dataset, named Materialized Individual Objects (MIO), which features abundant images, diverse categories, and accurate annotations. Extensive quantitative and qualitative experiments demonstrate the effectiveness of our method.</description><subject>Computer Science - Computer Vision and Pattern Recognition</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNo1j8FKxDAURbNxIaMf4GryA63Je0nauBumjgojCjP7kqSvQ8C2khTRv7eOujpc7uXCYexGilLVWotblz7jRwlKqFKiBbxku2c3U4ru7UAnbO74goHGOY4n3tCYif_3mfdpGjg0_DXFKS1xShwbvsmZ5nzFLvplQ9d_XLHj7v64fSz2Lw9P282-cKbCwvShlk4HsKqSnTdobABptYHOdNoqD0bazkqBMhDUBLrqhQ_BByu8UwZXbP17exZp31McXPpqf4TasxB-A9v2RAM</recordid><startdate>20240422</startdate><enddate>20240422</enddate><creator>Li, Zeyu</creator><creator>Gan, Ruitong</creator><creator>Luo, Chuanchen</creator><creator>Wang, Yuxi</creator><creator>Liu, Jiaheng</creator><creator>Zhang, Ziwei Zhu Man</creator><creator>Li, Qing</creator><creator>Yin, Xucheng</creator><creator>Zhang, Zhaoxiang</creator><creator>Peng, Junran</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240422</creationdate><title>MaterialSeg3D: Segmenting Dense Materials from 2D Priors for 3D Assets</title><author>Li, Zeyu ; Gan, Ruitong ; Luo, Chuanchen ; Wang, Yuxi ; Liu, Jiaheng ; Zhang, Ziwei Zhu Man ; Li, Qing ; Yin, Xucheng ; Zhang, Zhaoxiang ; Peng, Junran</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a673-6fc81a5c29471db6369c219562d6d594b2619d91031ce28e257f0bccbc90ba463</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Computer Vision and Pattern Recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>Li, Zeyu</creatorcontrib><creatorcontrib>Gan, Ruitong</creatorcontrib><creatorcontrib>Luo, Chuanchen</creatorcontrib><creatorcontrib>Wang, Yuxi</creatorcontrib><creatorcontrib>Liu, Jiaheng</creatorcontrib><creatorcontrib>Zhang, Ziwei Zhu Man</creatorcontrib><creatorcontrib>Li, Qing</creatorcontrib><creatorcontrib>Yin, Xucheng</creatorcontrib><creatorcontrib>Zhang, Zhaoxiang</creatorcontrib><creatorcontrib>Peng, Junran</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Li, Zeyu</au><au>Gan, Ruitong</au><au>Luo, Chuanchen</au><au>Wang, Yuxi</au><au>Liu, Jiaheng</au><au>Zhang, Ziwei Zhu Man</au><au>Li, Qing</au><au>Yin, Xucheng</au><au>Zhang, Zhaoxiang</au><au>Peng, Junran</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>MaterialSeg3D: Segmenting Dense Materials from 2D Priors for 3D Assets</atitle><date>2024-04-22</date><risdate>2024</risdate><abstract>Driven by powerful image diffusion models, recent research has achieved the automatic creation of 3D objects from textual or visual guidance. By performing score distillation sampling (SDS) iteratively across different views, these methods succeed in lifting 2D generative prior to the 3D space. However, such a 2D generative image prior bakes the effect of illumination and shadow into the texture. As a result, material maps optimized by SDS inevitably involve spurious correlated components. The absence of precise material definition makes it infeasible to relight the generated assets reasonably in novel scenes, which limits their application in downstream scenarios. In contrast, humans can effortlessly circumvent this ambiguity by deducing the material of the object from its appearance and semantics. Motivated by this insight, we propose MaterialSeg3D, a 3D asset material generation framework to infer underlying material from the 2D semantic prior. Based on such a prior model, we devise a mechanism to parse material in 3D space. We maintain a UV stack, each map of which is unprojected from a specific viewpoint. After traversing all viewpoints, we fuse the stack through a weighted voting scheme and then employ region unification to ensure the coherence of the object parts. To fuel the learning of semantics prior, we collect a material dataset, named Materialized Individual Objects (MIO), which features abundant images, diverse categories, and accurate annotations. Extensive quantitative and qualitative experiments demonstrate the effectiveness of our method.</abstract><doi>10.48550/arxiv.2404.13923</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2404.13923
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2404_13923
source	arXiv.org
subjects	Computer Science - Computer Vision and Pattern Recognition
title	MaterialSeg3D: Segmenting Dense Materials from 2D Priors for 3D Assets
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-04T06%3A27%3A01IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=MaterialSeg3D:%20Segmenting%20Dense%20Materials%20from%202D%20Priors%20for%203D%20Assets&rft.au=Li,%20Zeyu&rft.date=2024-04-22&rft_id=info:doi/10.48550/arxiv.2404.13923&rft_dat=%3Carxiv_GOX%3E2404_13923%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true