Discrete Wavelet Transform Meets Transformer: Unleashing the Full Potential of the Transformer for Visual Recognition

Traditionally, the success of the Transformer has been attributed to its token mixer, particularly the self-attention mechanism. However, recent studies suggest that replacing such attention-based token mixer with alternative techniques can yield comparable results in various vision tasks. This high...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE access 2023, Vol.11, p.102430-102443
Hauptverfasser:	Yang, Dongwook, Seo, Seung-Woo
Format:	Artikel
Sprache:	eng
Schlagworte:	Architectural structure Convolution Discrete Wavelet Transform Discrete wavelet transforms efficient network Feature extraction Mixers Task analysis transformer Transformers Vision Visual analytics Visualization Wavelet transforms
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Traditionally, the success of the Transformer has been attributed to its token mixer, particularly the self-attention mechanism. However, recent studies suggest that replacing such attention-based token mixer with alternative techniques can yield comparable results in various vision tasks. This highlights the importance of the model's overall architectural structure in achieving optimal performance, rather than relying exclusively on the specific choice of the token mixer. Building on this insight, we introduce Discrete Wavelet TransFormer, an innovative framework that incorporates Discrete Wavelet Transform to elevate all the building blocks of the Transformer to a higher standard. By exploiting distinct attributes of Discrete Wavelet Transform, Discrete Wavelet TransFormer not only strengthens the network's ability to learn more intricate feature representations across different levels of abstraction, but also facilitates lossless down-sampling to promote a more resilient and efficient network. A comprehensive evaluation is conducted on diverse vision tasks, and the results conclusively demonstrate that Discrete Wavelet TransFormer outperforms all other state-of-the-art Transformer-based models across all tasks by a significant margin.
ISSN:	2169-3536 2169-3536
DOI:	10.1109/ACCESS.2023.3316144