Direct2.5: Diverse Text-to-3D Generation via Multi-view 2.5D Diffusion

Recent advances in generative AI have unveiled significant potential for the creation of 3D content. However, current methods either apply a pre-trained 2D diffusion model with the time-consuming score distillation sampling (SDS), or a direct 3D diffusion model trained on limited 3D data losing gene...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:arXiv.org 2024-03
Hauptverfasser: Lu, Yuanxun, Zhang, Jingyang, Li, Shiwei, Tian Fang, McKinnon, David, Tsin, Yanghai, Long, Quan, Cao, Xun, Yao, Yao
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Lu, Yuanxun
Zhang, Jingyang
Li, Shiwei
Tian Fang
McKinnon, David
Tsin, Yanghai
Long, Quan
Cao, Xun
Yao, Yao
description Recent advances in generative AI have unveiled significant potential for the creation of 3D content. However, current methods either apply a pre-trained 2D diffusion model with the time-consuming score distillation sampling (SDS), or a direct 3D diffusion model trained on limited 3D data losing generation diversity. In this work, we approach the problem by employing a multi-view 2.5D diffusion fine-tuned from a pre-trained 2D diffusion model. The multi-view 2.5D diffusion directly models the structural distribution of 3D data, while still maintaining the strong generalization ability of the original 2D diffusion model, filling the gap between 2D diffusion-based and direct 3D diffusion-based methods for 3D content generation. During inference, multi-view normal maps are generated using the 2.5D diffusion, and a novel differentiable rasterization scheme is introduced to fuse the almost consistent multi-view normal maps into a consistent 3D model. We further design a normal-conditioned multi-view image generation module for fast appearance generation given the 3D geometry. Our method is a one-pass diffusion process and does not require any SDS optimization as post-processing. We demonstrate through extensive experiments that, our direct 2.5D generation with the specially-designed fusion scheme can achieve diverse, mode-seeking-free, and high-fidelity 3D content generation in only 10 seconds. Project page: https://nju-3dv.github.io/projects/direct25.
format Article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2894583073</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2894583073</sourcerecordid><originalsourceid>FETCH-proquest_journals_28945830733</originalsourceid><addsrcrecordid>eNqNi7sKwjAUQIMgWLT_EHCOxHsbW12N1cWteylyCyml0Tyqn28GP8DpDOecBcsAcS-qAmDFcu8HKSUcSlAKM1Zr4-gRYKdOXJuZnCfe0CeIYAVqfqWJXBeMnfhsOn6PYzBiNvTm6dDp6Pvok92wZd-NnvIf12xbX5rzTTydfUXyoR1sdFNSLVTHQlUoS8T_qi-dlDkh</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2894583073</pqid></control><display><type>article</type><title>Direct2.5: Diverse Text-to-3D Generation via Multi-view 2.5D Diffusion</title><source>Free E- Journals</source><creator>Lu, Yuanxun ; Zhang, Jingyang ; Li, Shiwei ; Tian Fang ; McKinnon, David ; Tsin, Yanghai ; Long, Quan ; Cao, Xun ; Yao, Yao</creator><creatorcontrib>Lu, Yuanxun ; Zhang, Jingyang ; Li, Shiwei ; Tian Fang ; McKinnon, David ; Tsin, Yanghai ; Long, Quan ; Cao, Xun ; Yao, Yao</creatorcontrib><description>Recent advances in generative AI have unveiled significant potential for the creation of 3D content. However, current methods either apply a pre-trained 2D diffusion model with the time-consuming score distillation sampling (SDS), or a direct 3D diffusion model trained on limited 3D data losing generation diversity. In this work, we approach the problem by employing a multi-view 2.5D diffusion fine-tuned from a pre-trained 2D diffusion model. The multi-view 2.5D diffusion directly models the structural distribution of 3D data, while still maintaining the strong generalization ability of the original 2D diffusion model, filling the gap between 2D diffusion-based and direct 3D diffusion-based methods for 3D content generation. During inference, multi-view normal maps are generated using the 2.5D diffusion, and a novel differentiable rasterization scheme is introduced to fuse the almost consistent multi-view normal maps into a consistent 3D model. We further design a normal-conditioned multi-view image generation module for fast appearance generation given the 3D geometry. Our method is a one-pass diffusion process and does not require any SDS optimization as post-processing. We demonstrate through extensive experiments that, our direct 2.5D generation with the specially-designed fusion scheme can achieve diverse, mode-seeking-free, and high-fidelity 3D content generation in only 10 seconds. Project page: https://nju-3dv.github.io/projects/direct25.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Distillation ; Image processing ; Three dimensional models ; Two dimensional models</subject><ispartof>arXiv.org, 2024-03</ispartof><rights>2024. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>780,784</link.rule.ids></links><search><creatorcontrib>Lu, Yuanxun</creatorcontrib><creatorcontrib>Zhang, Jingyang</creatorcontrib><creatorcontrib>Li, Shiwei</creatorcontrib><creatorcontrib>Tian Fang</creatorcontrib><creatorcontrib>McKinnon, David</creatorcontrib><creatorcontrib>Tsin, Yanghai</creatorcontrib><creatorcontrib>Long, Quan</creatorcontrib><creatorcontrib>Cao, Xun</creatorcontrib><creatorcontrib>Yao, Yao</creatorcontrib><title>Direct2.5: Diverse Text-to-3D Generation via Multi-view 2.5D Diffusion</title><title>arXiv.org</title><description>Recent advances in generative AI have unveiled significant potential for the creation of 3D content. However, current methods either apply a pre-trained 2D diffusion model with the time-consuming score distillation sampling (SDS), or a direct 3D diffusion model trained on limited 3D data losing generation diversity. In this work, we approach the problem by employing a multi-view 2.5D diffusion fine-tuned from a pre-trained 2D diffusion model. The multi-view 2.5D diffusion directly models the structural distribution of 3D data, while still maintaining the strong generalization ability of the original 2D diffusion model, filling the gap between 2D diffusion-based and direct 3D diffusion-based methods for 3D content generation. During inference, multi-view normal maps are generated using the 2.5D diffusion, and a novel differentiable rasterization scheme is introduced to fuse the almost consistent multi-view normal maps into a consistent 3D model. We further design a normal-conditioned multi-view image generation module for fast appearance generation given the 3D geometry. Our method is a one-pass diffusion process and does not require any SDS optimization as post-processing. We demonstrate through extensive experiments that, our direct 2.5D generation with the specially-designed fusion scheme can achieve diverse, mode-seeking-free, and high-fidelity 3D content generation in only 10 seconds. Project page: https://nju-3dv.github.io/projects/direct25.</description><subject>Distillation</subject><subject>Image processing</subject><subject>Three dimensional models</subject><subject>Two dimensional models</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNqNi7sKwjAUQIMgWLT_EHCOxHsbW12N1cWteylyCyml0Tyqn28GP8DpDOecBcsAcS-qAmDFcu8HKSUcSlAKM1Zr4-gRYKdOXJuZnCfe0CeIYAVqfqWJXBeMnfhsOn6PYzBiNvTm6dDp6Pvok92wZd-NnvIf12xbX5rzTTydfUXyoR1sdFNSLVTHQlUoS8T_qi-dlDkh</recordid><startdate>20240321</startdate><enddate>20240321</enddate><creator>Lu, Yuanxun</creator><creator>Zhang, Jingyang</creator><creator>Li, Shiwei</creator><creator>Tian Fang</creator><creator>McKinnon, David</creator><creator>Tsin, Yanghai</creator><creator>Long, Quan</creator><creator>Cao, Xun</creator><creator>Yao, Yao</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20240321</creationdate><title>Direct2.5: Diverse Text-to-3D Generation via Multi-view 2.5D Diffusion</title><author>Lu, Yuanxun ; Zhang, Jingyang ; Li, Shiwei ; Tian Fang ; McKinnon, David ; Tsin, Yanghai ; Long, Quan ; Cao, Xun ; Yao, Yao</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_28945830733</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Distillation</topic><topic>Image processing</topic><topic>Three dimensional models</topic><topic>Two dimensional models</topic><toplevel>online_resources</toplevel><creatorcontrib>Lu, Yuanxun</creatorcontrib><creatorcontrib>Zhang, Jingyang</creatorcontrib><creatorcontrib>Li, Shiwei</creatorcontrib><creatorcontrib>Tian Fang</creatorcontrib><creatorcontrib>McKinnon, David</creatorcontrib><creatorcontrib>Tsin, Yanghai</creatorcontrib><creatorcontrib>Long, Quan</creatorcontrib><creatorcontrib>Cao, Xun</creatorcontrib><creatorcontrib>Yao, Yao</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Lu, Yuanxun</au><au>Zhang, Jingyang</au><au>Li, Shiwei</au><au>Tian Fang</au><au>McKinnon, David</au><au>Tsin, Yanghai</au><au>Long, Quan</au><au>Cao, Xun</au><au>Yao, Yao</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Direct2.5: Diverse Text-to-3D Generation via Multi-view 2.5D Diffusion</atitle><jtitle>arXiv.org</jtitle><date>2024-03-21</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>Recent advances in generative AI have unveiled significant potential for the creation of 3D content. However, current methods either apply a pre-trained 2D diffusion model with the time-consuming score distillation sampling (SDS), or a direct 3D diffusion model trained on limited 3D data losing generation diversity. In this work, we approach the problem by employing a multi-view 2.5D diffusion fine-tuned from a pre-trained 2D diffusion model. The multi-view 2.5D diffusion directly models the structural distribution of 3D data, while still maintaining the strong generalization ability of the original 2D diffusion model, filling the gap between 2D diffusion-based and direct 3D diffusion-based methods for 3D content generation. During inference, multi-view normal maps are generated using the 2.5D diffusion, and a novel differentiable rasterization scheme is introduced to fuse the almost consistent multi-view normal maps into a consistent 3D model. We further design a normal-conditioned multi-view image generation module for fast appearance generation given the 3D geometry. Our method is a one-pass diffusion process and does not require any SDS optimization as post-processing. We demonstrate through extensive experiments that, our direct 2.5D generation with the specially-designed fusion scheme can achieve diverse, mode-seeking-free, and high-fidelity 3D content generation in only 10 seconds. Project page: https://nju-3dv.github.io/projects/direct25.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2024-03
issn 2331-8422
language eng
recordid cdi_proquest_journals_2894583073
source Free E- Journals
subjects Distillation
Image processing
Three dimensional models
Two dimensional models
title Direct2.5: Diverse Text-to-3D Generation via Multi-view 2.5D Diffusion
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-13T22%3A30%3A53IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Direct2.5:%20Diverse%20Text-to-3D%20Generation%20via%20Multi-view%202.5D%20Diffusion&rft.jtitle=arXiv.org&rft.au=Lu,%20Yuanxun&rft.date=2024-03-21&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2894583073%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2894583073&rft_id=info:pmid/&rfr_iscdi=true