Exploring the Synergies of Hybrid CNNs and ViTs Architectures for Computer Vision: A survey
The hybrid of Convolutional Neural Network (CNN) and Vision Transformers (ViT) architectures has emerged as a groundbreaking approach, pushing the boundaries of computer vision (CV). This comprehensive review provides a thorough examination of the literature on state-of-the-art hybrid CNN-ViT archit...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The hybrid of Convolutional Neural Network (CNN) and Vision Transformers
(ViT) architectures has emerged as a groundbreaking approach, pushing the
boundaries of computer vision (CV). This comprehensive review provides a
thorough examination of the literature on state-of-the-art hybrid CNN-ViT
architectures, exploring the synergies between these two approaches. The main
content of this survey includes: (1) a background on the vanilla CNN and ViT,
(2) systematic review of various taxonomic hybrid designs to explore the
synergy achieved through merging CNNs and ViTs models, (3) comparative analysis
and application task-specific synergy between different hybrid architectures,
(4) challenges and future directions for hybrid models, (5) lastly, the survey
concludes with a summary of key findings and recommendations. Through this
exploration of hybrid CV architectures, the survey aims to serve as a guiding
resource, fostering a deeper understanding of the intricate dynamics between
CNNs and ViTs and their collective impact on shaping the future of CV
architectures. |
---|---|
DOI: | 10.48550/arxiv.2402.02941 |