MATCH: Model-Aware TVM-based Compilation for Heterogeneous Edge Devices
Streamlining the deployment of Deep Neural Networks (DNNs) on heterogeneous edge platforms, coupling within the same micro-controller unit (MCU) instruction processors and hardware accelerators for tensor computations, is becoming one of the crucial challenges of the TinyML field. The best-performin...
Gespeichert in:
Hauptverfasser: | , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Streamlining the deployment of Deep Neural Networks (DNNs) on heterogeneous
edge platforms, coupling within the same micro-controller unit (MCU)
instruction processors and hardware accelerators for tensor computations, is
becoming one of the crucial challenges of the TinyML field.
The best-performing DNN compilation toolchains are usually deeply customized
for a single MCU family, and porting to a different heterogeneous MCU family
implies labor-intensive re-development of almost the entire compiler. On the
opposite side, retargetable toolchains, such as TVM, fail to exploit the
capabilities of custom accelerators, resulting in the generation of general but
unoptimized code. To overcome this duality, we introduce MATCH, a novel
TVM-based DNN deployment framework designed for easy agile retargeting across
different MCU processors and accelerators, thanks to a customizable model-based
hardware abstraction.
We show that a general and retargetable mapping framework enhanced with
hardware cost models can compete with and even outperform custom toolchains on
diverse targets while only needing the definition of an abstract hardware model
and a SoC-specific API.
We tested MATCH on two state-of-the-art heterogeneous MCUs, GAP9 and DIANA.
On the four DNN models of the MLPerf Tiny suite MATCH reduces inference
latency by up to 60.88 times on DIANA, compared to using the plain TVM, thanks
to the exploitation of the on-board HW accelerator. Compared to HTVM, a fully
customized toolchain for DIANA, we still reduce the latency by 16.94%. On GAP9,
using the same benchmarks, we improve the latency by 2.15 times compared to the
dedicated DORY compiler, thanks to our heterogeneous DNN mapping approach that
synergically exploits the DNN accelerator and the eight-cores cluster available
on board. |
---|---|
DOI: | 10.48550/arxiv.2410.08855 |