DreamDrone: Text-to-Image Diffusion Models are Zero-shot Perpetual View Generators
We introduce DreamDrone, a novel zero-shot and training-free pipeline for generating unbounded flythrough scenes from textual prompts. Different from other methods that focus on warping images frame by frame, we advocate explicitly warping the intermediate latent code of the pre-trained text-to-imag...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Kong, Hanyang Lian, Dongze Mi, Michael Bi Wang, Xinchao |
description | We introduce DreamDrone, a novel zero-shot and training-free pipeline for
generating unbounded flythrough scenes from textual prompts. Different from
other methods that focus on warping images frame by frame, we advocate
explicitly warping the intermediate latent code of the pre-trained
text-to-image diffusion model for high-quality image generation and
generalization ability. To further enhance the fidelity of the generated
images, we also propose a feature-correspondence-guidance diffusion process and
a high-pass filtering strategy to promote geometric consistency and
high-frequency detail consistency, respectively. Extensive experiments reveal
that DreamDrone significantly surpasses existing methods, delivering highly
authentic scene generation with exceptional visual quality, without training or
fine-tuning on datasets or reconstructing 3D point clouds in advance. |
doi_str_mv | 10.48550/arxiv.2312.08746 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2312_08746</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2312_08746</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2312_087463</originalsourceid><addsrcrecordid>eNqFjrsKwkAQAK-xEPUDrNwfuJinBlvPVyGIBAubsOBGD5Jc2Lto_Hsx2FtNMzAjxDTwvThNEn-O3OmnF0ZB6PnpMl4MxVkxYaXY1LSCjDonnZGHCu8EShdFa7Wp4WhuVFpAJrgSG2kfxsGJuCHXYgkXTS_YUU2MzrAdi0GBpaXJjyMx226y9V729bxhXSG_8-9F3l9E_40PRK88_g</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>DreamDrone: Text-to-Image Diffusion Models are Zero-shot Perpetual View Generators</title><source>arXiv.org</source><creator>Kong, Hanyang ; Lian, Dongze ; Mi, Michael Bi ; Wang, Xinchao</creator><creatorcontrib>Kong, Hanyang ; Lian, Dongze ; Mi, Michael Bi ; Wang, Xinchao</creatorcontrib><description>We introduce DreamDrone, a novel zero-shot and training-free pipeline for
generating unbounded flythrough scenes from textual prompts. Different from
other methods that focus on warping images frame by frame, we advocate
explicitly warping the intermediate latent code of the pre-trained
text-to-image diffusion model for high-quality image generation and
generalization ability. To further enhance the fidelity of the generated
images, we also propose a feature-correspondence-guidance diffusion process and
a high-pass filtering strategy to promote geometric consistency and
high-frequency detail consistency, respectively. Extensive experiments reveal
that DreamDrone significantly surpasses existing methods, delivering highly
authentic scene generation with exceptional visual quality, without training or
fine-tuning on datasets or reconstructing 3D point clouds in advance.</description><identifier>DOI: 10.48550/arxiv.2312.08746</identifier><language>eng</language><subject>Computer Science - Computer Vision and Pattern Recognition</subject><creationdate>2023-12</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,777,882</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2312.08746$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2312.08746$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Kong, Hanyang</creatorcontrib><creatorcontrib>Lian, Dongze</creatorcontrib><creatorcontrib>Mi, Michael Bi</creatorcontrib><creatorcontrib>Wang, Xinchao</creatorcontrib><title>DreamDrone: Text-to-Image Diffusion Models are Zero-shot Perpetual View Generators</title><description>We introduce DreamDrone, a novel zero-shot and training-free pipeline for
generating unbounded flythrough scenes from textual prompts. Different from
other methods that focus on warping images frame by frame, we advocate
explicitly warping the intermediate latent code of the pre-trained
text-to-image diffusion model for high-quality image generation and
generalization ability. To further enhance the fidelity of the generated
images, we also propose a feature-correspondence-guidance diffusion process and
a high-pass filtering strategy to promote geometric consistency and
high-frequency detail consistency, respectively. Extensive experiments reveal
that DreamDrone significantly surpasses existing methods, delivering highly
authentic scene generation with exceptional visual quality, without training or
fine-tuning on datasets or reconstructing 3D point clouds in advance.</description><subject>Computer Science - Computer Vision and Pattern Recognition</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNqFjrsKwkAQAK-xEPUDrNwfuJinBlvPVyGIBAubsOBGD5Jc2Lto_Hsx2FtNMzAjxDTwvThNEn-O3OmnF0ZB6PnpMl4MxVkxYaXY1LSCjDonnZGHCu8EShdFa7Wp4WhuVFpAJrgSG2kfxsGJuCHXYgkXTS_YUU2MzrAdi0GBpaXJjyMx226y9V729bxhXSG_8-9F3l9E_40PRK88_g</recordid><startdate>20231214</startdate><enddate>20231214</enddate><creator>Kong, Hanyang</creator><creator>Lian, Dongze</creator><creator>Mi, Michael Bi</creator><creator>Wang, Xinchao</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20231214</creationdate><title>DreamDrone: Text-to-Image Diffusion Models are Zero-shot Perpetual View Generators</title><author>Kong, Hanyang ; Lian, Dongze ; Mi, Michael Bi ; Wang, Xinchao</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2312_087463</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Computer Vision and Pattern Recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>Kong, Hanyang</creatorcontrib><creatorcontrib>Lian, Dongze</creatorcontrib><creatorcontrib>Mi, Michael Bi</creatorcontrib><creatorcontrib>Wang, Xinchao</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Kong, Hanyang</au><au>Lian, Dongze</au><au>Mi, Michael Bi</au><au>Wang, Xinchao</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>DreamDrone: Text-to-Image Diffusion Models are Zero-shot Perpetual View Generators</atitle><date>2023-12-14</date><risdate>2023</risdate><abstract>We introduce DreamDrone, a novel zero-shot and training-free pipeline for
generating unbounded flythrough scenes from textual prompts. Different from
other methods that focus on warping images frame by frame, we advocate
explicitly warping the intermediate latent code of the pre-trained
text-to-image diffusion model for high-quality image generation and
generalization ability. To further enhance the fidelity of the generated
images, we also propose a feature-correspondence-guidance diffusion process and
a high-pass filtering strategy to promote geometric consistency and
high-frequency detail consistency, respectively. Extensive experiments reveal
that DreamDrone significantly surpasses existing methods, delivering highly
authentic scene generation with exceptional visual quality, without training or
fine-tuning on datasets or reconstructing 3D point clouds in advance.</abstract><doi>10.48550/arxiv.2312.08746</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2312.08746 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2312_08746 |
source | arXiv.org |
subjects | Computer Science - Computer Vision and Pattern Recognition |
title | DreamDrone: Text-to-Image Diffusion Models are Zero-shot Perpetual View Generators |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-18T16%3A03%3A50IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=DreamDrone:%20Text-to-Image%20Diffusion%20Models%20are%20Zero-shot%20Perpetual%20View%20Generators&rft.au=Kong,%20Hanyang&rft.date=2023-12-14&rft_id=info:doi/10.48550/arxiv.2312.08746&rft_dat=%3Carxiv_GOX%3E2312_08746%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |