TinyFormer: Efficient Transformer Design and Deployment on Tiny Devices
Developing deep learning models on tiny devices (e.g. Microcontroller units, MCUs) has attracted much attention in various embedded IoT applications. However, it is challenging to efficiently design and deploy recent advanced models (e.g. transformers) on tiny devices due to their severe hardware re...
Gespeichert in:
Hauptverfasser: | , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Developing deep learning models on tiny devices (e.g. Microcontroller units,
MCUs) has attracted much attention in various embedded IoT applications.
However, it is challenging to efficiently design and deploy recent advanced
models (e.g. transformers) on tiny devices due to their severe hardware
resource constraints. In this work, we propose TinyFormer, a framework
specifically designed to develop and deploy resource-efficient transformers on
MCUs. TinyFormer mainly consists of SuperNAS, SparseNAS and SparseEngine.
Separately, SuperNAS aims to search for an appropriate supernet from a vast
search space. SparseNAS evaluates the best sparse single-path model including
transformer architecture from the identified supernet. Finally, SparseEngine
efficiently deploys the searched sparse models onto MCUs. To the best of our
knowledge, SparseEngine is the first deployment framework capable of performing
inference of sparse models with transformer on MCUs. Evaluation results on the
CIFAR-10 dataset demonstrate that TinyFormer can develop efficient transformers
with an accuracy of $96.1\%$ while adhering to hardware constraints of $1$MB
storage and $320$KB memory. Additionally, TinyFormer achieves significant
speedups in sparse inference, up to $12.2\times$, when compared to the CMSIS-NN
library. TinyFormer is believed to bring powerful transformers into TinyML
scenarios and greatly expand the scope of deep learning applications. |
---|---|
DOI: | 10.48550/arxiv.2311.01759 |