S2WAT: Image Style Transfer via Hierarchical Vision Transformer using Strips Window Attention
Transformer's recent integration into style transfer leverages its proficiency in establishing long-range dependencies, albeit at the expense of attenuated local modeling. This paper introduces Strips Window Attention Transformer (S2WAT), a novel hierarchical vision transformer designed for sty...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Zhang, Chiyu Xu, Xiaogang Wang, Lei Dai, Zaiyan Yang, Jun |
description | Transformer's recent integration into style transfer leverages its
proficiency in establishing long-range dependencies, albeit at the expense of
attenuated local modeling. This paper introduces Strips Window Attention
Transformer (S2WAT), a novel hierarchical vision transformer designed for style
transfer. S2WAT employs attention computation in diverse window shapes to
capture both short- and long-range dependencies. The merged dependencies
utilize the "Attn Merge" strategy, which adaptively determines spatial weights
based on their relevance to the target. Extensive experiments on representative
datasets show the proposed method's effectiveness compared to state-of-the-art
(SOTA) transformer-based and other approaches. The code and pre-trained models
are available at https://github.com/AlienZhang1996/S2WAT. |
doi_str_mv | 10.48550/arxiv.2210.12381 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2210_12381</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2210_12381</sourcerecordid><originalsourceid>FETCH-LOGICAL-a671-80826c231ba3900dd66fe7788438483999d55bb6af33fae488ebe1f802c88d003</originalsourceid><addsrcrecordid>eNotj8tOwzAURL1hgQofwAr_QIofiXPTXVQBrVSJRSO6QtFNbLeWEqeyQ6F_T2i7Gml0ZqRDyBNn8xSyjL1g-HWnuRBTwYUEfk--tmJXVgu67nFv6HY8d4ZWAX20JtCTQ7pyJmBoD67Fjn666AZ_A4bQT8x3dH4_DYM7RrpzXg8_tBxH48eJfCB3FrtoHm85I9Xba7VcJZuP9_Wy3CSocp4AA6FaIXmDsmBMa6WsyXOAVEIKsigKnWVNo9BKadGkAKYx3AITLYBmTM7I8_X24lcfg-sxnOt_z_riKf8AJ0tNfQ</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>S2WAT: Image Style Transfer via Hierarchical Vision Transformer using Strips Window Attention</title><source>arXiv.org</source><creator>Zhang, Chiyu ; Xu, Xiaogang ; Wang, Lei ; Dai, Zaiyan ; Yang, Jun</creator><creatorcontrib>Zhang, Chiyu ; Xu, Xiaogang ; Wang, Lei ; Dai, Zaiyan ; Yang, Jun</creatorcontrib><description>Transformer's recent integration into style transfer leverages its
proficiency in establishing long-range dependencies, albeit at the expense of
attenuated local modeling. This paper introduces Strips Window Attention
Transformer (S2WAT), a novel hierarchical vision transformer designed for style
transfer. S2WAT employs attention computation in diverse window shapes to
capture both short- and long-range dependencies. The merged dependencies
utilize the "Attn Merge" strategy, which adaptively determines spatial weights
based on their relevance to the target. Extensive experiments on representative
datasets show the proposed method's effectiveness compared to state-of-the-art
(SOTA) transformer-based and other approaches. The code and pre-trained models
are available at https://github.com/AlienZhang1996/S2WAT.</description><identifier>DOI: 10.48550/arxiv.2210.12381</identifier><language>eng</language><subject>Computer Science - Computer Vision and Pattern Recognition</subject><creationdate>2022-10</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2210.12381$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2210.12381$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Zhang, Chiyu</creatorcontrib><creatorcontrib>Xu, Xiaogang</creatorcontrib><creatorcontrib>Wang, Lei</creatorcontrib><creatorcontrib>Dai, Zaiyan</creatorcontrib><creatorcontrib>Yang, Jun</creatorcontrib><title>S2WAT: Image Style Transfer via Hierarchical Vision Transformer using Strips Window Attention</title><description>Transformer's recent integration into style transfer leverages its
proficiency in establishing long-range dependencies, albeit at the expense of
attenuated local modeling. This paper introduces Strips Window Attention
Transformer (S2WAT), a novel hierarchical vision transformer designed for style
transfer. S2WAT employs attention computation in diverse window shapes to
capture both short- and long-range dependencies. The merged dependencies
utilize the "Attn Merge" strategy, which adaptively determines spatial weights
based on their relevance to the target. Extensive experiments on representative
datasets show the proposed method's effectiveness compared to state-of-the-art
(SOTA) transformer-based and other approaches. The code and pre-trained models
are available at https://github.com/AlienZhang1996/S2WAT.</description><subject>Computer Science - Computer Vision and Pattern Recognition</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj8tOwzAURL1hgQofwAr_QIofiXPTXVQBrVSJRSO6QtFNbLeWEqeyQ6F_T2i7Gml0ZqRDyBNn8xSyjL1g-HWnuRBTwYUEfk--tmJXVgu67nFv6HY8d4ZWAX20JtCTQ7pyJmBoD67Fjn666AZ_A4bQT8x3dH4_DYM7RrpzXg8_tBxH48eJfCB3FrtoHm85I9Xba7VcJZuP9_Wy3CSocp4AA6FaIXmDsmBMa6WsyXOAVEIKsigKnWVNo9BKadGkAKYx3AITLYBmTM7I8_X24lcfg-sxnOt_z_riKf8AJ0tNfQ</recordid><startdate>20221022</startdate><enddate>20221022</enddate><creator>Zhang, Chiyu</creator><creator>Xu, Xiaogang</creator><creator>Wang, Lei</creator><creator>Dai, Zaiyan</creator><creator>Yang, Jun</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20221022</creationdate><title>S2WAT: Image Style Transfer via Hierarchical Vision Transformer using Strips Window Attention</title><author>Zhang, Chiyu ; Xu, Xiaogang ; Wang, Lei ; Dai, Zaiyan ; Yang, Jun</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a671-80826c231ba3900dd66fe7788438483999d55bb6af33fae488ebe1f802c88d003</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Computer Science - Computer Vision and Pattern Recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>Zhang, Chiyu</creatorcontrib><creatorcontrib>Xu, Xiaogang</creatorcontrib><creatorcontrib>Wang, Lei</creatorcontrib><creatorcontrib>Dai, Zaiyan</creatorcontrib><creatorcontrib>Yang, Jun</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Zhang, Chiyu</au><au>Xu, Xiaogang</au><au>Wang, Lei</au><au>Dai, Zaiyan</au><au>Yang, Jun</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>S2WAT: Image Style Transfer via Hierarchical Vision Transformer using Strips Window Attention</atitle><date>2022-10-22</date><risdate>2022</risdate><abstract>Transformer's recent integration into style transfer leverages its
proficiency in establishing long-range dependencies, albeit at the expense of
attenuated local modeling. This paper introduces Strips Window Attention
Transformer (S2WAT), a novel hierarchical vision transformer designed for style
transfer. S2WAT employs attention computation in diverse window shapes to
capture both short- and long-range dependencies. The merged dependencies
utilize the "Attn Merge" strategy, which adaptively determines spatial weights
based on their relevance to the target. Extensive experiments on representative
datasets show the proposed method's effectiveness compared to state-of-the-art
(SOTA) transformer-based and other approaches. The code and pre-trained models
are available at https://github.com/AlienZhang1996/S2WAT.</abstract><doi>10.48550/arxiv.2210.12381</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2210.12381 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2210_12381 |
source | arXiv.org |
subjects | Computer Science - Computer Vision and Pattern Recognition |
title | S2WAT: Image Style Transfer via Hierarchical Vision Transformer using Strips Window Attention |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-31T17%3A13%3A10IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=S2WAT:%20Image%20Style%20Transfer%20via%20Hierarchical%20Vision%20Transformer%20using%20Strips%20Window%20Attention&rft.au=Zhang,%20Chiyu&rft.date=2022-10-22&rft_id=info:doi/10.48550/arxiv.2210.12381&rft_dat=%3Carxiv_GOX%3E2210_12381%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |