CLIP3Dstyler: Language Guided 3D Arbitrary Neural Style Transfer

In this paper, we propose a novel language-guided 3D arbitrary neural style transfer method (CLIP3Dstyler). We aim at stylizing any 3D scene with an arbitrary style from a text description, and synthesizing the novel stylized view, which is more flexible than the image-conditioned style transfer. Co...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Gao, Ming, Xu, YanWu, Zhao, Yang, Hou, Tingbo, Zhao, Chenkai, Gong, Mingming
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Computer Vision and Pattern Recognition
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Gao, Ming Xu, YanWu Zhao, Yang Hou, Tingbo Zhao, Chenkai Gong, Mingming
description	In this paper, we propose a novel language-guided 3D arbitrary neural style transfer method (CLIP3Dstyler). We aim at stylizing any 3D scene with an arbitrary style from a text description, and synthesizing the novel stylized view, which is more flexible than the image-conditioned style transfer. Compared with the previous 2D method CLIPStyler, we are able to stylize a 3D scene and generalize to novel scenes without re-train our model. A straightforward solution is to combine previous image-conditioned 3D style transfer and text-conditioned 2D style transfer \bigskip methods. However, such a solution cannot achieve our goal due to two main challenges. First, there is no multi-modal model matching point clouds and language at different feature scales (low-level, high-level). Second, we observe a style mixing issue when we stylize the content with different style conditions from text prompts. To address the first issue, we propose a 3D stylization framework to match the point cloud features with text features in local and global views. For the second issue, we propose an improved directional divergence loss to make arbitrary text styles more distinguishable as a complement to our framework. We conduct extensive experiments to show the effectiveness of our model on text-guided 3D scene style transfer.
doi_str_mv	10.48550/arxiv.2305.15732
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2305_15732</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2305_15732</sourcerecordid><originalsourceid>FETCH-LOGICAL-a672-416aa7af83202a2c7ee1fa547fe00d05fa13f5e31565edcb155347a8c6e043443</originalsourceid><addsrcrecordid>eNotz0FrwkAQBeC99FC0P6Cn7h9I3N3ZyQZPSmxVCG3B3MOYzEogFZmYUv991fb0Lo_H-5R6tib1OaKZkfx036kDg6nFAO5RLYpy-wmr4XzpWea6pONhpAPr9di13GpY6aXsu7OQXPQ7j0K93t26uhI6DpFlqh4i9QM__edEVW-vVbFJyo_1tliWCWXBJd5mRIFiDs44ck1gtpHQh8jGtAYjWYjIYDFDbpu9RQQfKG8yNh68h4l6-Zu9E-qTdF_XS_WNUt8p8AsrmUJa</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>CLIP3Dstyler: Language Guided 3D Arbitrary Neural Style Transfer</title><source>arXiv.org</source><creator>Gao, Ming ; Xu, YanWu ; Zhao, Yang ; Hou, Tingbo ; Zhao, Chenkai ; Gong, Mingming</creator><creatorcontrib>Gao, Ming ; Xu, YanWu ; Zhao, Yang ; Hou, Tingbo ; Zhao, Chenkai ; Gong, Mingming</creatorcontrib><description>In this paper, we propose a novel language-guided 3D arbitrary neural style transfer method (CLIP3Dstyler). We aim at stylizing any 3D scene with an arbitrary style from a text description, and synthesizing the novel stylized view, which is more flexible than the image-conditioned style transfer. Compared with the previous 2D method CLIPStyler, we are able to stylize a 3D scene and generalize to novel scenes without re-train our model. A straightforward solution is to combine previous image-conditioned 3D style transfer and text-conditioned 2D style transfer \bigskip methods. However, such a solution cannot achieve our goal due to two main challenges. First, there is no multi-modal model matching point clouds and language at different feature scales (low-level, high-level). Second, we observe a style mixing issue when we stylize the content with different style conditions from text prompts. To address the first issue, we propose a 3D stylization framework to match the point cloud features with text features in local and global views. For the second issue, we propose an improved directional divergence loss to make arbitrary text styles more distinguishable as a complement to our framework. We conduct extensive experiments to show the effectiveness of our model on text-guided 3D scene style transfer.</description><identifier>DOI: 10.48550/arxiv.2305.15732</identifier><language>eng</language><subject>Computer Science - Computer Vision and Pattern Recognition</subject><creationdate>2023-05</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2305.15732$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2305.15732$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Gao, Ming</creatorcontrib><creatorcontrib>Xu, YanWu</creatorcontrib><creatorcontrib>Zhao, Yang</creatorcontrib><creatorcontrib>Hou, Tingbo</creatorcontrib><creatorcontrib>Zhao, Chenkai</creatorcontrib><creatorcontrib>Gong, Mingming</creatorcontrib><title>CLIP3Dstyler: Language Guided 3D Arbitrary Neural Style Transfer</title><description>In this paper, we propose a novel language-guided 3D arbitrary neural style transfer method (CLIP3Dstyler). We aim at stylizing any 3D scene with an arbitrary style from a text description, and synthesizing the novel stylized view, which is more flexible than the image-conditioned style transfer. Compared with the previous 2D method CLIPStyler, we are able to stylize a 3D scene and generalize to novel scenes without re-train our model. A straightforward solution is to combine previous image-conditioned 3D style transfer and text-conditioned 2D style transfer \bigskip methods. However, such a solution cannot achieve our goal due to two main challenges. First, there is no multi-modal model matching point clouds and language at different feature scales (low-level, high-level). Second, we observe a style mixing issue when we stylize the content with different style conditions from text prompts. To address the first issue, we propose a 3D stylization framework to match the point cloud features with text features in local and global views. For the second issue, we propose an improved directional divergence loss to make arbitrary text styles more distinguishable as a complement to our framework. We conduct extensive experiments to show the effectiveness of our model on text-guided 3D scene style transfer.</description><subject>Computer Science - Computer Vision and Pattern Recognition</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotz0FrwkAQBeC99FC0P6Cn7h9I3N3ZyQZPSmxVCG3B3MOYzEogFZmYUv991fb0Lo_H-5R6tib1OaKZkfx036kDg6nFAO5RLYpy-wmr4XzpWea6pONhpAPr9di13GpY6aXsu7OQXPQ7j0K93t26uhI6DpFlqh4i9QM__edEVW-vVbFJyo_1tliWCWXBJd5mRIFiDs44ck1gtpHQh8jGtAYjWYjIYDFDbpu9RQQfKG8yNh68h4l6-Zu9E-qTdF_XS_WNUt8p8AsrmUJa</recordid><startdate>20230525</startdate><enddate>20230525</enddate><creator>Gao, Ming</creator><creator>Xu, YanWu</creator><creator>Zhao, Yang</creator><creator>Hou, Tingbo</creator><creator>Zhao, Chenkai</creator><creator>Gong, Mingming</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20230525</creationdate><title>CLIP3Dstyler: Language Guided 3D Arbitrary Neural Style Transfer</title><author>Gao, Ming ; Xu, YanWu ; Zhao, Yang ; Hou, Tingbo ; Zhao, Chenkai ; Gong, Mingming</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a672-416aa7af83202a2c7ee1fa547fe00d05fa13f5e31565edcb155347a8c6e043443</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Computer Vision and Pattern Recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>Gao, Ming</creatorcontrib><creatorcontrib>Xu, YanWu</creatorcontrib><creatorcontrib>Zhao, Yang</creatorcontrib><creatorcontrib>Hou, Tingbo</creatorcontrib><creatorcontrib>Zhao, Chenkai</creatorcontrib><creatorcontrib>Gong, Mingming</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Gao, Ming</au><au>Xu, YanWu</au><au>Zhao, Yang</au><au>Hou, Tingbo</au><au>Zhao, Chenkai</au><au>Gong, Mingming</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>CLIP3Dstyler: Language Guided 3D Arbitrary Neural Style Transfer</atitle><date>2023-05-25</date><risdate>2023</risdate><abstract>In this paper, we propose a novel language-guided 3D arbitrary neural style transfer method (CLIP3Dstyler). We aim at stylizing any 3D scene with an arbitrary style from a text description, and synthesizing the novel stylized view, which is more flexible than the image-conditioned style transfer. Compared with the previous 2D method CLIPStyler, we are able to stylize a 3D scene and generalize to novel scenes without re-train our model. A straightforward solution is to combine previous image-conditioned 3D style transfer and text-conditioned 2D style transfer \bigskip methods. However, such a solution cannot achieve our goal due to two main challenges. First, there is no multi-modal model matching point clouds and language at different feature scales (low-level, high-level). Second, we observe a style mixing issue when we stylize the content with different style conditions from text prompts. To address the first issue, we propose a 3D stylization framework to match the point cloud features with text features in local and global views. For the second issue, we propose an improved directional divergence loss to make arbitrary text styles more distinguishable as a complement to our framework. We conduct extensive experiments to show the effectiveness of our model on text-guided 3D scene style transfer.</abstract><doi>10.48550/arxiv.2305.15732</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2305.15732
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2305_15732
source	arXiv.org
subjects	Computer Science - Computer Vision and Pattern Recognition
title	CLIP3Dstyler: Language Guided 3D Arbitrary Neural Style Transfer
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-01T12%3A46%3A00IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=CLIP3Dstyler:%20Language%20Guided%203D%20Arbitrary%20Neural%20Style%20Transfer&rft.au=Gao,%20Ming&rft.date=2023-05-25&rft_id=info:doi/10.48550/arxiv.2305.15732&rft_dat=%3Carxiv_GOX%3E2305_15732%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true