Efficiency 360: Efficient Vision Transformers
Transformers are widely used for solving tasks in natural language processing, computer vision, speech, and music domains. In this paper, we talk about the efficiency of transformers in terms of memory (the number of parameters), computation cost (number of floating points operations), and performan...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Transformers are widely used for solving tasks in natural language
processing, computer vision, speech, and music domains. In this paper, we talk
about the efficiency of transformers in terms of memory (the number of
parameters), computation cost (number of floating points operations), and
performance of models, including accuracy, the robustness of the model, and
fair \& bias-free features. We mainly discuss the vision transformer for the
image classification task. Our contribution is to introduce an efficient 360
framework, which includes various aspects of the vision transformer, to make it
more efficient for industrial applications. By considering those applications,
we categorize them into multiple dimensions such as privacy, robustness,
transparency, fairness, inclusiveness, continual learning, probabilistic
models, approximation, computational complexity, and spectral complexity. We
compare various vision transformer models based on their performance, the
number of parameters, and the number of floating point operations (FLOPs) on
multiple datasets. |
---|---|
DOI: | 10.48550/arxiv.2302.08374 |