Sherpa3D: Boosting High-Fidelity Text-to-3D Generation via Coarse 3D Prior

Recently, 3D content creation from text prompts has demonstrated remarkable progress by utilizing 2D and 3D diffusion models. While 3D diffusion models ensure great multi-view consistency, their ability to generate high-quality and diverse 3D assets is hindered by the limited 3D data. In contrast, 2...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:arXiv.org 2023-12
Hauptverfasser: Liu, Fangfu, Wu, Diankun, Wei, Yi, Rao, Yongming, Duan, Yueqi
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Liu, Fangfu
Wu, Diankun
Wei, Yi
Rao, Yongming
Duan, Yueqi
description Recently, 3D content creation from text prompts has demonstrated remarkable progress by utilizing 2D and 3D diffusion models. While 3D diffusion models ensure great multi-view consistency, their ability to generate high-quality and diverse 3D assets is hindered by the limited 3D data. In contrast, 2D diffusion models find a distillation approach that achieves excellent generalization and rich details without any 3D data. However, 2D lifting methods suffer from inherent view-agnostic ambiguity thereby leading to serious multi-face Janus issues, where text prompts fail to provide sufficient guidance to learn coherent 3D results. Instead of retraining a costly viewpoint-aware model, we study how to fully exploit easily accessible coarse 3D knowledge to enhance the prompts and guide 2D lifting optimization for refinement. In this paper, we propose Sherpa3D, a new text-to-3D framework that achieves high-fidelity, generalizability, and geometric consistency simultaneously. Specifically, we design a pair of guiding strategies derived from the coarse 3D prior generated by the 3D diffusion model: a structural guidance for geometric fidelity and a semantic guidance for 3D coherence. Employing the two types of guidance, the 2D diffusion model enriches the 3D content with diversified and high-quality results. Extensive experiments show the superiority of our Sherpa3D over the state-of-the-art text-to-3D methods in terms of quality and 3D consistency.
format Article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2900744388</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2900744388</sourcerecordid><originalsourceid>FETCH-proquest_journals_29007443883</originalsourceid><addsrcrecordid>eNqNykELgjAYgOERBEn5HwadB2vTtI5pJp0CvcugL53IPttm1L_PQz-g03t43gUJhJQ7lkZCrEjoXM85F_tExLEMyLXqwI5K5kd6QnRem5aWuu1Yoe8waP-hNbw988hkTi9gwCqv0dCXVjRDZR3QGW5Wo92Q5UMNDsJf12RbnOusZKPF5wTONz1O1szUiAPnSRTJNJX_XV_e1jru</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2900744388</pqid></control><display><type>article</type><title>Sherpa3D: Boosting High-Fidelity Text-to-3D Generation via Coarse 3D Prior</title><source>Free E- Journals</source><creator>Liu, Fangfu ; Wu, Diankun ; Wei, Yi ; Rao, Yongming ; Duan, Yueqi</creator><creatorcontrib>Liu, Fangfu ; Wu, Diankun ; Wei, Yi ; Rao, Yongming ; Duan, Yueqi</creatorcontrib><description>Recently, 3D content creation from text prompts has demonstrated remarkable progress by utilizing 2D and 3D diffusion models. While 3D diffusion models ensure great multi-view consistency, their ability to generate high-quality and diverse 3D assets is hindered by the limited 3D data. In contrast, 2D diffusion models find a distillation approach that achieves excellent generalization and rich details without any 3D data. However, 2D lifting methods suffer from inherent view-agnostic ambiguity thereby leading to serious multi-face Janus issues, where text prompts fail to provide sufficient guidance to learn coherent 3D results. Instead of retraining a costly viewpoint-aware model, we study how to fully exploit easily accessible coarse 3D knowledge to enhance the prompts and guide 2D lifting optimization for refinement. In this paper, we propose Sherpa3D, a new text-to-3D framework that achieves high-fidelity, generalizability, and geometric consistency simultaneously. Specifically, we design a pair of guiding strategies derived from the coarse 3D prior generated by the 3D diffusion model: a structural guidance for geometric fidelity and a semantic guidance for 3D coherence. Employing the two types of guidance, the 2D diffusion model enriches the 3D content with diversified and high-quality results. Extensive experiments show the superiority of our Sherpa3D over the state-of-the-art text-to-3D methods in terms of quality and 3D consistency.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Accuracy ; Consistency ; Distillation ; Three dimensional models ; Two dimensional models</subject><ispartof>arXiv.org, 2023-12</ispartof><rights>2023. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>780,784</link.rule.ids></links><search><creatorcontrib>Liu, Fangfu</creatorcontrib><creatorcontrib>Wu, Diankun</creatorcontrib><creatorcontrib>Wei, Yi</creatorcontrib><creatorcontrib>Rao, Yongming</creatorcontrib><creatorcontrib>Duan, Yueqi</creatorcontrib><title>Sherpa3D: Boosting High-Fidelity Text-to-3D Generation via Coarse 3D Prior</title><title>arXiv.org</title><description>Recently, 3D content creation from text prompts has demonstrated remarkable progress by utilizing 2D and 3D diffusion models. While 3D diffusion models ensure great multi-view consistency, their ability to generate high-quality and diverse 3D assets is hindered by the limited 3D data. In contrast, 2D diffusion models find a distillation approach that achieves excellent generalization and rich details without any 3D data. However, 2D lifting methods suffer from inherent view-agnostic ambiguity thereby leading to serious multi-face Janus issues, where text prompts fail to provide sufficient guidance to learn coherent 3D results. Instead of retraining a costly viewpoint-aware model, we study how to fully exploit easily accessible coarse 3D knowledge to enhance the prompts and guide 2D lifting optimization for refinement. In this paper, we propose Sherpa3D, a new text-to-3D framework that achieves high-fidelity, generalizability, and geometric consistency simultaneously. Specifically, we design a pair of guiding strategies derived from the coarse 3D prior generated by the 3D diffusion model: a structural guidance for geometric fidelity and a semantic guidance for 3D coherence. Employing the two types of guidance, the 2D diffusion model enriches the 3D content with diversified and high-quality results. Extensive experiments show the superiority of our Sherpa3D over the state-of-the-art text-to-3D methods in terms of quality and 3D consistency.</description><subject>Accuracy</subject><subject>Consistency</subject><subject>Distillation</subject><subject>Three dimensional models</subject><subject>Two dimensional models</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNqNykELgjAYgOERBEn5HwadB2vTtI5pJp0CvcugL53IPttm1L_PQz-g03t43gUJhJQ7lkZCrEjoXM85F_tExLEMyLXqwI5K5kd6QnRem5aWuu1Yoe8waP-hNbw988hkTi9gwCqv0dCXVjRDZR3QGW5Wo92Q5UMNDsJf12RbnOusZKPF5wTONz1O1szUiAPnSRTJNJX_XV_e1jru</recordid><startdate>20231211</startdate><enddate>20231211</enddate><creator>Liu, Fangfu</creator><creator>Wu, Diankun</creator><creator>Wei, Yi</creator><creator>Rao, Yongming</creator><creator>Duan, Yueqi</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20231211</creationdate><title>Sherpa3D: Boosting High-Fidelity Text-to-3D Generation via Coarse 3D Prior</title><author>Liu, Fangfu ; Wu, Diankun ; Wei, Yi ; Rao, Yongming ; Duan, Yueqi</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_29007443883</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Accuracy</topic><topic>Consistency</topic><topic>Distillation</topic><topic>Three dimensional models</topic><topic>Two dimensional models</topic><toplevel>online_resources</toplevel><creatorcontrib>Liu, Fangfu</creatorcontrib><creatorcontrib>Wu, Diankun</creatorcontrib><creatorcontrib>Wei, Yi</creatorcontrib><creatorcontrib>Rao, Yongming</creatorcontrib><creatorcontrib>Duan, Yueqi</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database (Proquest) (PQ_SDU_P3)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Liu, Fangfu</au><au>Wu, Diankun</au><au>Wei, Yi</au><au>Rao, Yongming</au><au>Duan, Yueqi</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Sherpa3D: Boosting High-Fidelity Text-to-3D Generation via Coarse 3D Prior</atitle><jtitle>arXiv.org</jtitle><date>2023-12-11</date><risdate>2023</risdate><eissn>2331-8422</eissn><abstract>Recently, 3D content creation from text prompts has demonstrated remarkable progress by utilizing 2D and 3D diffusion models. While 3D diffusion models ensure great multi-view consistency, their ability to generate high-quality and diverse 3D assets is hindered by the limited 3D data. In contrast, 2D diffusion models find a distillation approach that achieves excellent generalization and rich details without any 3D data. However, 2D lifting methods suffer from inherent view-agnostic ambiguity thereby leading to serious multi-face Janus issues, where text prompts fail to provide sufficient guidance to learn coherent 3D results. Instead of retraining a costly viewpoint-aware model, we study how to fully exploit easily accessible coarse 3D knowledge to enhance the prompts and guide 2D lifting optimization for refinement. In this paper, we propose Sherpa3D, a new text-to-3D framework that achieves high-fidelity, generalizability, and geometric consistency simultaneously. Specifically, we design a pair of guiding strategies derived from the coarse 3D prior generated by the 3D diffusion model: a structural guidance for geometric fidelity and a semantic guidance for 3D coherence. Employing the two types of guidance, the 2D diffusion model enriches the 3D content with diversified and high-quality results. Extensive experiments show the superiority of our Sherpa3D over the state-of-the-art text-to-3D methods in terms of quality and 3D consistency.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2023-12
issn 2331-8422
language eng
recordid cdi_proquest_journals_2900744388
source Free E- Journals
subjects Accuracy
Consistency
Distillation
Three dimensional models
Two dimensional models
title Sherpa3D: Boosting High-Fidelity Text-to-3D Generation via Coarse 3D Prior
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-03T15%3A26%3A28IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Sherpa3D:%20Boosting%20High-Fidelity%20Text-to-3D%20Generation%20via%20Coarse%203D%20Prior&rft.jtitle=arXiv.org&rft.au=Liu,%20Fangfu&rft.date=2023-12-11&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2900744388%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2900744388&rft_id=info:pmid/&rfr_iscdi=true