MaterialSeg3D: Segmenting Dense Materials from 2D Priors for 3D Assets
Driven by powerful image diffusion models, recent research has achieved the automatic creation of 3D objects from textual or visual guidance. By performing score distillation sampling (SDS) iteratively across different views, these methods succeed in lifting 2D generative prior to the 3D space. Howe...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Driven by powerful image diffusion models, recent research has achieved the
automatic creation of 3D objects from textual or visual guidance. By performing
score distillation sampling (SDS) iteratively across different views, these
methods succeed in lifting 2D generative prior to the 3D space. However, such a
2D generative image prior bakes the effect of illumination and shadow into the
texture. As a result, material maps optimized by SDS inevitably involve
spurious correlated components. The absence of precise material definition
makes it infeasible to relight the generated assets reasonably in novel scenes,
which limits their application in downstream scenarios. In contrast, humans can
effortlessly circumvent this ambiguity by deducing the material of the object
from its appearance and semantics. Motivated by this insight, we propose
MaterialSeg3D, a 3D asset material generation framework to infer underlying
material from the 2D semantic prior. Based on such a prior model, we devise a
mechanism to parse material in 3D space. We maintain a UV stack, each map of
which is unprojected from a specific viewpoint. After traversing all
viewpoints, we fuse the stack through a weighted voting scheme and then employ
region unification to ensure the coherence of the object parts. To fuel the
learning of semantics prior, we collect a material dataset, named Materialized
Individual Objects (MIO), which features abundant images, diverse categories,
and accurate annotations. Extensive quantitative and qualitative experiments
demonstrate the effectiveness of our method. |
---|---|
DOI: | 10.48550/arxiv.2404.13923 |