Direct2.5: Diverse Text-to-3D Generation via Multi-view 2.5D Diffusion

Recent advances in generative AI have unveiled significant potential for the creation of 3D content. However, current methods either apply a pre-trained 2D diffusion model with the time-consuming score distillation sampling (SDS), or a direct 3D diffusion model trained on limited 3D data losing gene...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2024-03
Hauptverfasser:	Lu, Yuanxun, Zhang, Jingyang, Li, Shiwei, Tian Fang, McKinnon, David, Tsin, Yanghai, Long, Quan, Cao, Xun, Yao, Yao
Format:	Artikel
Sprache:	eng
Schlagworte:	Distillation Image processing Three dimensional models Two dimensional models
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Lu, Yuanxun Zhang, Jingyang Li, Shiwei Tian Fang McKinnon, David Tsin, Yanghai Long, Quan Cao, Xun Yao, Yao
description	Recent advances in generative AI have unveiled significant potential for the creation of 3D content. However, current methods either apply a pre-trained 2D diffusion model with the time-consuming score distillation sampling (SDS), or a direct 3D diffusion model trained on limited 3D data losing generation diversity. In this work, we approach the problem by employing a multi-view 2.5D diffusion fine-tuned from a pre-trained 2D diffusion model. The multi-view 2.5D diffusion directly models the structural distribution of 3D data, while still maintaining the strong generalization ability of the original 2D diffusion model, filling the gap between 2D diffusion-based and direct 3D diffusion-based methods for 3D content generation. During inference, multi-view normal maps are generated using the 2.5D diffusion, and a novel differentiable rasterization scheme is introduced to fuse the almost consistent multi-view normal maps into a consistent 3D model. We further design a normal-conditioned multi-view image generation module for fast appearance generation given the 3D geometry. Our method is a one-pass diffusion process and does not require any SDS optimization as post-processing. We demonstrate through extensive experiments that, our direct 2.5D generation with the specially-designed fusion scheme can achieve diverse, mode-seeking-free, and high-fidelity 3D content generation in only 10 seconds. Project page: https://nju-3dv.github.io/projects/direct25.
format	Article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2894583073</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2894583073</sourcerecordid><originalsourceid>FETCH-proquest_journals_28945830733</originalsourceid><addsrcrecordid>eNqNi7sKwjAUQIMgWLT_EHCOxHsbW12N1cWteylyCyml0Tyqn28GP8DpDOecBcsAcS-qAmDFcu8HKSUcSlAKM1Zr4-gRYKdOXJuZnCfe0CeIYAVqfqWJXBeMnfhsOn6PYzBiNvTm6dDp6Pvok92wZd-NnvIf12xbX5rzTTydfUXyoR1sdFNSLVTHQlUoS8T_qi-dlDkh</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2894583073</pqid></control><display><type>article</type><title>Direct2.5: Diverse Text-to-3D Generation via Multi-view 2.5D Diffusion</title><source>Free E- Journals</source><creator>Lu, Yuanxun ; Zhang, Jingyang ; Li, Shiwei ; Tian Fang ; McKinnon, David ; Tsin, Yanghai ; Long, Quan ; Cao, Xun ; Yao, Yao</creator><creatorcontrib>Lu, Yuanxun ; Zhang, Jingyang ; Li, Shiwei ; Tian Fang ; McKinnon, David ; Tsin, Yanghai ; Long, Quan ; Cao, Xun ; Yao, Yao</creatorcontrib><description>Recent advances in generative AI have unveiled significant potential for the creation of 3D content. However, current methods either apply a pre-trained 2D diffusion model with the time-consuming score distillation sampling (SDS), or a direct 3D diffusion model trained on limited 3D data losing generation diversity. In this work, we approach the problem by employing a multi-view 2.5D diffusion fine-tuned from a pre-trained 2D diffusion model. The multi-view 2.5D diffusion directly models the structural distribution of 3D data, while still maintaining the strong generalization ability of the original 2D diffusion model, filling the gap between 2D diffusion-based and direct 3D diffusion-based methods for 3D content generation. During inference, multi-view normal maps are generated using the 2.5D diffusion, and a novel differentiable rasterization scheme is introduced to fuse the almost consistent multi-view normal maps into a consistent 3D model. We further design a normal-conditioned multi-view image generation module for fast appearance generation given the 3D geometry. Our method is a one-pass diffusion process and does not require any SDS optimization as post-processing. We demonstrate through extensive experiments that, our direct 2.5D generation with the specially-designed fusion scheme can achieve diverse, mode-seeking-free, and high-fidelity 3D content generation in only 10 seconds. Project page: https://nju-3dv.github.io/projects/direct25.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Distillation ; Image processing ; Three dimensional models ; Two dimensional models</subject><ispartof>arXiv.org, 2024-03</ispartof><rights>2024. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>780,784</link.rule.ids></links><search><creatorcontrib>Lu, Yuanxun</creatorcontrib><creatorcontrib>Zhang, Jingyang</creatorcontrib><creatorcontrib>Li, Shiwei</creatorcontrib><creatorcontrib>Tian Fang</creatorcontrib><creatorcontrib>McKinnon, David</creatorcontrib><creatorcontrib>Tsin, Yanghai</creatorcontrib><creatorcontrib>Long, Quan</creatorcontrib><creatorcontrib>Cao, Xun</creatorcontrib><creatorcontrib>Yao, Yao</creatorcontrib><title>Direct2.5: Diverse Text-to-3D Generation via Multi-view 2.5D Diffusion</title><title>arXiv.org</title><description>Recent advances in generative AI have unveiled significant potential for the creation of 3D content. However, current methods either apply a pre-trained 2D diffusion model with the time-consuming score distillation sampling (SDS), or a direct 3D diffusion model trained on limited 3D data losing generation diversity. In this work, we approach the problem by employing a multi-view 2.5D diffusion fine-tuned from a pre-trained 2D diffusion model. The multi-view 2.5D diffusion directly models the structural distribution of 3D data, while still maintaining the strong generalization ability of the original 2D diffusion model, filling the gap between 2D diffusion-based and direct 3D diffusion-based methods for 3D content generation. During inference, multi-view normal maps are generated using the 2.5D diffusion, and a novel differentiable rasterization scheme is introduced to fuse the almost consistent multi-view normal maps into a consistent 3D model. We further design a normal-conditioned multi-view image generation module for fast appearance generation given the 3D geometry. Our method is a one-pass diffusion process and does not require any SDS optimization as post-processing. We demonstrate through extensive experiments that, our direct 2.5D generation with the specially-designed fusion scheme can achieve diverse, mode-seeking-free, and high-fidelity 3D content generation in only 10 seconds. Project page: https://nju-3dv.github.io/projects/direct25.</description><subject>Distillation</subject><subject>Image processing</subject><subject>Three dimensional models</subject><subject>Two dimensional models</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNqNi7sKwjAUQIMgWLT_EHCOxHsbW12N1cWteylyCyml0Tyqn28GP8DpDOecBcsAcS-qAmDFcu8HKSUcSlAKM1Zr4-gRYKdOXJuZnCfe0CeIYAVqfqWJXBeMnfhsOn6PYzBiNvTm6dDp6Pvok92wZd-NnvIf12xbX5rzTTydfUXyoR1sdFNSLVTHQlUoS8T_qi-dlDkh</recordid><startdate>20240321</startdate><enddate>20240321</enddate><creator>Lu, Yuanxun</creator><creator>Zhang, Jingyang</creator><creator>Li, Shiwei</creator><creator>Tian Fang</creator><creator>McKinnon, David</creator><creator>Tsin, Yanghai</creator><creator>Long, Quan</creator><creator>Cao, Xun</creator><creator>Yao, Yao</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20240321</creationdate><title>Direct2.5: Diverse Text-to-3D Generation via Multi-view 2.5D Diffusion</title><author>Lu, Yuanxun ; Zhang, Jingyang ; Li, Shiwei ; Tian Fang ; McKinnon, David ; Tsin, Yanghai ; Long, Quan ; Cao, Xun ; Yao, Yao</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_28945830733</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Distillation</topic><topic>Image processing</topic><topic>Three dimensional models</topic><topic>Two dimensional models</topic><toplevel>online_resources</toplevel><creatorcontrib>Lu, Yuanxun</creatorcontrib><creatorcontrib>Zhang, Jingyang</creatorcontrib><creatorcontrib>Li, Shiwei</creatorcontrib><creatorcontrib>Tian Fang</creatorcontrib><creatorcontrib>McKinnon, David</creatorcontrib><creatorcontrib>Tsin, Yanghai</creatorcontrib><creatorcontrib>Long, Quan</creatorcontrib><creatorcontrib>Cao, Xun</creatorcontrib><creatorcontrib>Yao, Yao</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Lu, Yuanxun</au><au>Zhang, Jingyang</au><au>Li, Shiwei</au><au>Tian Fang</au><au>McKinnon, David</au><au>Tsin, Yanghai</au><au>Long, Quan</au><au>Cao, Xun</au><au>Yao, Yao</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Direct2.5: Diverse Text-to-3D Generation via Multi-view 2.5D Diffusion</atitle><jtitle>arXiv.org</jtitle><date>2024-03-21</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>Recent advances in generative AI have unveiled significant potential for the creation of 3D content. However, current methods either apply a pre-trained 2D diffusion model with the time-consuming score distillation sampling (SDS), or a direct 3D diffusion model trained on limited 3D data losing generation diversity. In this work, we approach the problem by employing a multi-view 2.5D diffusion fine-tuned from a pre-trained 2D diffusion model. The multi-view 2.5D diffusion directly models the structural distribution of 3D data, while still maintaining the strong generalization ability of the original 2D diffusion model, filling the gap between 2D diffusion-based and direct 3D diffusion-based methods for 3D content generation. During inference, multi-view normal maps are generated using the 2.5D diffusion, and a novel differentiable rasterization scheme is introduced to fuse the almost consistent multi-view normal maps into a consistent 3D model. We further design a normal-conditioned multi-view image generation module for fast appearance generation given the 3D geometry. Our method is a one-pass diffusion process and does not require any SDS optimization as post-processing. We demonstrate through extensive experiments that, our direct 2.5D generation with the specially-designed fusion scheme can achieve diverse, mode-seeking-free, and high-fidelity 3D content generation in only 10 seconds. Project page: https://nju-3dv.github.io/projects/direct25.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2024-03
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_2894583073
source	Free E- Journals
subjects	Distillation Image processing Three dimensional models Two dimensional models
title	Direct2.5: Diverse Text-to-3D Generation via Multi-view 2.5D Diffusion
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-13T22%3A30%3A53IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Direct2.5:%20Diverse%20Text-to-3D%20Generation%20via%20Multi-view%202.5D%20Diffusion&rft.jtitle=arXiv.org&rft.au=Lu,%20Yuanxun&rft.date=2024-03-21&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2894583073%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2894583073&rft_id=info:pmid/&rfr_iscdi=true