CLIP3Dstyler: Language Guided 3D Arbitrary Neural Style Transfer

In this paper, we propose a novel language-guided 3D arbitrary neural style transfer method (CLIP3Dstyler). We aim at stylizing any 3D scene with an arbitrary style from a text description, and synthesizing the novel stylized view, which is more flexible than the image-conditioned style transfer. Co...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Gao, Ming, Xu, YanWu, Zhao, Yang, Hou, Tingbo, Zhao, Chenkai, Gong, Mingming
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Gao, Ming
Xu, YanWu
Zhao, Yang
Hou, Tingbo
Zhao, Chenkai
Gong, Mingming
description In this paper, we propose a novel language-guided 3D arbitrary neural style transfer method (CLIP3Dstyler). We aim at stylizing any 3D scene with an arbitrary style from a text description, and synthesizing the novel stylized view, which is more flexible than the image-conditioned style transfer. Compared with the previous 2D method CLIPStyler, we are able to stylize a 3D scene and generalize to novel scenes without re-train our model. A straightforward solution is to combine previous image-conditioned 3D style transfer and text-conditioned 2D style transfer \bigskip methods. However, such a solution cannot achieve our goal due to two main challenges. First, there is no multi-modal model matching point clouds and language at different feature scales (low-level, high-level). Second, we observe a style mixing issue when we stylize the content with different style conditions from text prompts. To address the first issue, we propose a 3D stylization framework to match the point cloud features with text features in local and global views. For the second issue, we propose an improved directional divergence loss to make arbitrary text styles more distinguishable as a complement to our framework. We conduct extensive experiments to show the effectiveness of our model on text-guided 3D scene style transfer.
doi_str_mv 10.48550/arxiv.2305.15732
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2305_15732</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2305_15732</sourcerecordid><originalsourceid>FETCH-LOGICAL-a672-416aa7af83202a2c7ee1fa547fe00d05fa13f5e31565edcb155347a8c6e043443</originalsourceid><addsrcrecordid>eNotz0FrwkAQBeC99FC0P6Cn7h9I3N3ZyQZPSmxVCG3B3MOYzEogFZmYUv991fb0Lo_H-5R6tib1OaKZkfx036kDg6nFAO5RLYpy-wmr4XzpWea6pONhpAPr9di13GpY6aXsu7OQXPQ7j0K93t26uhI6DpFlqh4i9QM__edEVW-vVbFJyo_1tliWCWXBJd5mRIFiDs44ck1gtpHQh8jGtAYjWYjIYDFDbpu9RQQfKG8yNh68h4l6-Zu9E-qTdF_XS_WNUt8p8AsrmUJa</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>CLIP3Dstyler: Language Guided 3D Arbitrary Neural Style Transfer</title><source>arXiv.org</source><creator>Gao, Ming ; Xu, YanWu ; Zhao, Yang ; Hou, Tingbo ; Zhao, Chenkai ; Gong, Mingming</creator><creatorcontrib>Gao, Ming ; Xu, YanWu ; Zhao, Yang ; Hou, Tingbo ; Zhao, Chenkai ; Gong, Mingming</creatorcontrib><description>In this paper, we propose a novel language-guided 3D arbitrary neural style transfer method (CLIP3Dstyler). We aim at stylizing any 3D scene with an arbitrary style from a text description, and synthesizing the novel stylized view, which is more flexible than the image-conditioned style transfer. Compared with the previous 2D method CLIPStyler, we are able to stylize a 3D scene and generalize to novel scenes without re-train our model. A straightforward solution is to combine previous image-conditioned 3D style transfer and text-conditioned 2D style transfer \bigskip methods. However, such a solution cannot achieve our goal due to two main challenges. First, there is no multi-modal model matching point clouds and language at different feature scales (low-level, high-level). Second, we observe a style mixing issue when we stylize the content with different style conditions from text prompts. To address the first issue, we propose a 3D stylization framework to match the point cloud features with text features in local and global views. For the second issue, we propose an improved directional divergence loss to make arbitrary text styles more distinguishable as a complement to our framework. We conduct extensive experiments to show the effectiveness of our model on text-guided 3D scene style transfer.</description><identifier>DOI: 10.48550/arxiv.2305.15732</identifier><language>eng</language><subject>Computer Science - Computer Vision and Pattern Recognition</subject><creationdate>2023-05</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2305.15732$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2305.15732$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Gao, Ming</creatorcontrib><creatorcontrib>Xu, YanWu</creatorcontrib><creatorcontrib>Zhao, Yang</creatorcontrib><creatorcontrib>Hou, Tingbo</creatorcontrib><creatorcontrib>Zhao, Chenkai</creatorcontrib><creatorcontrib>Gong, Mingming</creatorcontrib><title>CLIP3Dstyler: Language Guided 3D Arbitrary Neural Style Transfer</title><description>In this paper, we propose a novel language-guided 3D arbitrary neural style transfer method (CLIP3Dstyler). We aim at stylizing any 3D scene with an arbitrary style from a text description, and synthesizing the novel stylized view, which is more flexible than the image-conditioned style transfer. Compared with the previous 2D method CLIPStyler, we are able to stylize a 3D scene and generalize to novel scenes without re-train our model. A straightforward solution is to combine previous image-conditioned 3D style transfer and text-conditioned 2D style transfer \bigskip methods. However, such a solution cannot achieve our goal due to two main challenges. First, there is no multi-modal model matching point clouds and language at different feature scales (low-level, high-level). Second, we observe a style mixing issue when we stylize the content with different style conditions from text prompts. To address the first issue, we propose a 3D stylization framework to match the point cloud features with text features in local and global views. For the second issue, we propose an improved directional divergence loss to make arbitrary text styles more distinguishable as a complement to our framework. We conduct extensive experiments to show the effectiveness of our model on text-guided 3D scene style transfer.</description><subject>Computer Science - Computer Vision and Pattern Recognition</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotz0FrwkAQBeC99FC0P6Cn7h9I3N3ZyQZPSmxVCG3B3MOYzEogFZmYUv991fb0Lo_H-5R6tib1OaKZkfx036kDg6nFAO5RLYpy-wmr4XzpWea6pONhpAPr9di13GpY6aXsu7OQXPQ7j0K93t26uhI6DpFlqh4i9QM__edEVW-vVbFJyo_1tliWCWXBJd5mRIFiDs44ck1gtpHQh8jGtAYjWYjIYDFDbpu9RQQfKG8yNh68h4l6-Zu9E-qTdF_XS_WNUt8p8AsrmUJa</recordid><startdate>20230525</startdate><enddate>20230525</enddate><creator>Gao, Ming</creator><creator>Xu, YanWu</creator><creator>Zhao, Yang</creator><creator>Hou, Tingbo</creator><creator>Zhao, Chenkai</creator><creator>Gong, Mingming</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20230525</creationdate><title>CLIP3Dstyler: Language Guided 3D Arbitrary Neural Style Transfer</title><author>Gao, Ming ; Xu, YanWu ; Zhao, Yang ; Hou, Tingbo ; Zhao, Chenkai ; Gong, Mingming</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a672-416aa7af83202a2c7ee1fa547fe00d05fa13f5e31565edcb155347a8c6e043443</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Computer Vision and Pattern Recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>Gao, Ming</creatorcontrib><creatorcontrib>Xu, YanWu</creatorcontrib><creatorcontrib>Zhao, Yang</creatorcontrib><creatorcontrib>Hou, Tingbo</creatorcontrib><creatorcontrib>Zhao, Chenkai</creatorcontrib><creatorcontrib>Gong, Mingming</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Gao, Ming</au><au>Xu, YanWu</au><au>Zhao, Yang</au><au>Hou, Tingbo</au><au>Zhao, Chenkai</au><au>Gong, Mingming</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>CLIP3Dstyler: Language Guided 3D Arbitrary Neural Style Transfer</atitle><date>2023-05-25</date><risdate>2023</risdate><abstract>In this paper, we propose a novel language-guided 3D arbitrary neural style transfer method (CLIP3Dstyler). We aim at stylizing any 3D scene with an arbitrary style from a text description, and synthesizing the novel stylized view, which is more flexible than the image-conditioned style transfer. Compared with the previous 2D method CLIPStyler, we are able to stylize a 3D scene and generalize to novel scenes without re-train our model. A straightforward solution is to combine previous image-conditioned 3D style transfer and text-conditioned 2D style transfer \bigskip methods. However, such a solution cannot achieve our goal due to two main challenges. First, there is no multi-modal model matching point clouds and language at different feature scales (low-level, high-level). Second, we observe a style mixing issue when we stylize the content with different style conditions from text prompts. To address the first issue, we propose a 3D stylization framework to match the point cloud features with text features in local and global views. For the second issue, we propose an improved directional divergence loss to make arbitrary text styles more distinguishable as a complement to our framework. We conduct extensive experiments to show the effectiveness of our model on text-guided 3D scene style transfer.</abstract><doi>10.48550/arxiv.2305.15732</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2305.15732
ispartof
issn
language eng
recordid cdi_arxiv_primary_2305_15732
source arXiv.org
subjects Computer Science - Computer Vision and Pattern Recognition
title CLIP3Dstyler: Language Guided 3D Arbitrary Neural Style Transfer
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-01T12%3A46%3A00IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=CLIP3Dstyler:%20Language%20Guided%203D%20Arbitrary%20Neural%20Style%20Transfer&rft.au=Gao,%20Ming&rft.date=2023-05-25&rft_id=info:doi/10.48550/arxiv.2305.15732&rft_dat=%3Carxiv_GOX%3E2305_15732%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true