S2WAT: Image Style Transfer via Hierarchical Vision Transformer using Strips Window Attention
Transformer's recent integration into style transfer leverages its proficiency in establishing long-range dependencies, albeit at the expense of attenuated local modeling. This paper introduces Strips Window Attention Transformer (S2WAT), a novel hierarchical vision transformer designed for sty...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Transformer's recent integration into style transfer leverages its
proficiency in establishing long-range dependencies, albeit at the expense of
attenuated local modeling. This paper introduces Strips Window Attention
Transformer (S2WAT), a novel hierarchical vision transformer designed for style
transfer. S2WAT employs attention computation in diverse window shapes to
capture both short- and long-range dependencies. The merged dependencies
utilize the "Attn Merge" strategy, which adaptively determines spatial weights
based on their relevance to the target. Extensive experiments on representative
datasets show the proposed method's effectiveness compared to state-of-the-art
(SOTA) transformer-based and other approaches. The code and pre-trained models
are available at https://github.com/AlienZhang1996/S2WAT. |
---|---|
DOI: | 10.48550/arxiv.2210.12381 |