TVM: An Automated End-to-End Optimizing Compiler for Deep Learning
There is an increasing need to bring machine learning to a wide diversity of hardware devices. Current frameworks rely on vendor-specific operator libraries and optimize for a narrow range of server-class GPUs. Deploying workloads to new platforms -- such as mobile phones, embedded devices, and acce...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | There is an increasing need to bring machine learning to a wide diversity of
hardware devices. Current frameworks rely on vendor-specific operator libraries
and optimize for a narrow range of server-class GPUs. Deploying workloads to
new platforms -- such as mobile phones, embedded devices, and accelerators
(e.g., FPGAs, ASICs) -- requires significant manual effort. We propose TVM, a
compiler that exposes graph-level and operator-level optimizations to provide
performance portability to deep learning workloads across diverse hardware
back-ends. TVM solves optimization challenges specific to deep learning, such
as high-level operator fusion, mapping to arbitrary hardware primitives, and
memory latency hiding. It also automates optimization of low-level programs to
hardware characteristics by employing a novel, learning-based cost modeling
method for rapid exploration of code optimizations. Experimental results show
that TVM delivers performance across hardware back-ends that are competitive
with state-of-the-art, hand-tuned libraries for low-power CPU, mobile GPU, and
server-class GPUs. We also demonstrate TVM's ability to target new accelerator
back-ends, such as the FPGA-based generic deep learning accelerator. The system
is open sourced and in production use inside several major companies. |
---|---|
DOI: | 10.48550/arxiv.1802.04799 |