TVM: An Automated End-to-End Optimizing Compiler for Deep Learning

There is an increasing need to bring machine learning to a wide diversity of hardware devices. Current frameworks rely on vendor-specific operator libraries and optimize for a narrow range of server-class GPUs. Deploying workloads to new platforms -- such as mobile phones, embedded devices, and acce...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Chen, Tianqi, Moreau, Thierry, Jiang, Ziheng, Zheng, Lianmin, Yan, Eddie, Cowan, Meghan, Shen, Haichen, Wang, Leyuan, Hu, Yuwei, Ceze, Luis, Guestrin, Carlos, Krishnamurthy, Arvind
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Artificial Intelligence Computer Science - Learning Computer Science - Programming Languages
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Chen, Tianqi Moreau, Thierry Jiang, Ziheng Zheng, Lianmin Yan, Eddie Cowan, Meghan Shen, Haichen Wang, Leyuan Hu, Yuwei Ceze, Luis Guestrin, Carlos Krishnamurthy, Arvind
description	There is an increasing need to bring machine learning to a wide diversity of hardware devices. Current frameworks rely on vendor-specific operator libraries and optimize for a narrow range of server-class GPUs. Deploying workloads to new platforms -- such as mobile phones, embedded devices, and accelerators (e.g., FPGAs, ASICs) -- requires significant manual effort. We propose TVM, a compiler that exposes graph-level and operator-level optimizations to provide performance portability to deep learning workloads across diverse hardware back-ends. TVM solves optimization challenges specific to deep learning, such as high-level operator fusion, mapping to arbitrary hardware primitives, and memory latency hiding. It also automates optimization of low-level programs to hardware characteristics by employing a novel, learning-based cost modeling method for rapid exploration of code optimizations. Experimental results show that TVM delivers performance across hardware back-ends that are competitive with state-of-the-art, hand-tuned libraries for low-power CPU, mobile GPU, and server-class GPUs. We also demonstrate TVM's ability to target new accelerator back-ends, such as the FPGA-based generic deep learning accelerator. The system is open sourced and in production use inside several major companies.
doi_str_mv	10.48550/arxiv.1802.04799
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_1802_04799</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1802_04799</sourcerecordid><originalsourceid>FETCH-LOGICAL-a679-9428d0d984fb48dab4b80af70e6fd946671a3c640e010f2da3088352f43f34203</originalsourceid><addsrcrecordid>eNotz71OwzAUhmEvDKj0ApjwDTg9sU8cmy1Ny48U1CVijU6wjSw1PzIBAVdPKUzv8Emf9DB2nUOGpihgQ-kzfmS5AZkBltZesm37_HTLq5FX78s00OId349OLJM4hR_mJQ7xO46vvJ6GOR594mFKfOf9zBtPaTxNV-wi0PHNr_-7Yu3dvq0fRHO4f6yrRpAurbAojQNnDYYejaMeewMUSvA6OItalzmpF43gIYcgHSkwRhUyoAoKJagVu_m7PSO6OcWB0lf3i-nOGPUDosRCkw</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>TVM: An Automated End-to-End Optimizing Compiler for Deep Learning</title><source>arXiv.org</source><creator>Chen, Tianqi ; Moreau, Thierry ; Jiang, Ziheng ; Zheng, Lianmin ; Yan, Eddie ; Cowan, Meghan ; Shen, Haichen ; Wang, Leyuan ; Hu, Yuwei ; Ceze, Luis ; Guestrin, Carlos ; Krishnamurthy, Arvind</creator><creatorcontrib>Chen, Tianqi ; Moreau, Thierry ; Jiang, Ziheng ; Zheng, Lianmin ; Yan, Eddie ; Cowan, Meghan ; Shen, Haichen ; Wang, Leyuan ; Hu, Yuwei ; Ceze, Luis ; Guestrin, Carlos ; Krishnamurthy, Arvind</creatorcontrib><description>There is an increasing need to bring machine learning to a wide diversity of hardware devices. Current frameworks rely on vendor-specific operator libraries and optimize for a narrow range of server-class GPUs. Deploying workloads to new platforms -- such as mobile phones, embedded devices, and accelerators (e.g., FPGAs, ASICs) -- requires significant manual effort. We propose TVM, a compiler that exposes graph-level and operator-level optimizations to provide performance portability to deep learning workloads across diverse hardware back-ends. TVM solves optimization challenges specific to deep learning, such as high-level operator fusion, mapping to arbitrary hardware primitives, and memory latency hiding. It also automates optimization of low-level programs to hardware characteristics by employing a novel, learning-based cost modeling method for rapid exploration of code optimizations. Experimental results show that TVM delivers performance across hardware back-ends that are competitive with state-of-the-art, hand-tuned libraries for low-power CPU, mobile GPU, and server-class GPUs. We also demonstrate TVM's ability to target new accelerator back-ends, such as the FPGA-based generic deep learning accelerator. The system is open sourced and in production use inside several major companies.</description><identifier>DOI: 10.48550/arxiv.1802.04799</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Learning ; Computer Science - Programming Languages</subject><creationdate>2018-02</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/1802.04799$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.1802.04799$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Chen, Tianqi</creatorcontrib><creatorcontrib>Moreau, Thierry</creatorcontrib><creatorcontrib>Jiang, Ziheng</creatorcontrib><creatorcontrib>Zheng, Lianmin</creatorcontrib><creatorcontrib>Yan, Eddie</creatorcontrib><creatorcontrib>Cowan, Meghan</creatorcontrib><creatorcontrib>Shen, Haichen</creatorcontrib><creatorcontrib>Wang, Leyuan</creatorcontrib><creatorcontrib>Hu, Yuwei</creatorcontrib><creatorcontrib>Ceze, Luis</creatorcontrib><creatorcontrib>Guestrin, Carlos</creatorcontrib><creatorcontrib>Krishnamurthy, Arvind</creatorcontrib><title>TVM: An Automated End-to-End Optimizing Compiler for Deep Learning</title><description>There is an increasing need to bring machine learning to a wide diversity of hardware devices. Current frameworks rely on vendor-specific operator libraries and optimize for a narrow range of server-class GPUs. Deploying workloads to new platforms -- such as mobile phones, embedded devices, and accelerators (e.g., FPGAs, ASICs) -- requires significant manual effort. We propose TVM, a compiler that exposes graph-level and operator-level optimizations to provide performance portability to deep learning workloads across diverse hardware back-ends. TVM solves optimization challenges specific to deep learning, such as high-level operator fusion, mapping to arbitrary hardware primitives, and memory latency hiding. It also automates optimization of low-level programs to hardware characteristics by employing a novel, learning-based cost modeling method for rapid exploration of code optimizations. Experimental results show that TVM delivers performance across hardware back-ends that are competitive with state-of-the-art, hand-tuned libraries for low-power CPU, mobile GPU, and server-class GPUs. We also demonstrate TVM's ability to target new accelerator back-ends, such as the FPGA-based generic deep learning accelerator. The system is open sourced and in production use inside several major companies.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Learning</subject><subject>Computer Science - Programming Languages</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotz71OwzAUhmEvDKj0ApjwDTg9sU8cmy1Ny48U1CVijU6wjSw1PzIBAVdPKUzv8Emf9DB2nUOGpihgQ-kzfmS5AZkBltZesm37_HTLq5FX78s00OId349OLJM4hR_mJQ7xO46vvJ6GOR594mFKfOf9zBtPaTxNV-wi0PHNr_-7Yu3dvq0fRHO4f6yrRpAurbAojQNnDYYejaMeewMUSvA6OItalzmpF43gIYcgHSkwRhUyoAoKJagVu_m7PSO6OcWB0lf3i-nOGPUDosRCkw</recordid><startdate>20180212</startdate><enddate>20180212</enddate><creator>Chen, Tianqi</creator><creator>Moreau, Thierry</creator><creator>Jiang, Ziheng</creator><creator>Zheng, Lianmin</creator><creator>Yan, Eddie</creator><creator>Cowan, Meghan</creator><creator>Shen, Haichen</creator><creator>Wang, Leyuan</creator><creator>Hu, Yuwei</creator><creator>Ceze, Luis</creator><creator>Guestrin, Carlos</creator><creator>Krishnamurthy, Arvind</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20180212</creationdate><title>TVM: An Automated End-to-End Optimizing Compiler for Deep Learning</title><author>Chen, Tianqi ; Moreau, Thierry ; Jiang, Ziheng ; Zheng, Lianmin ; Yan, Eddie ; Cowan, Meghan ; Shen, Haichen ; Wang, Leyuan ; Hu, Yuwei ; Ceze, Luis ; Guestrin, Carlos ; Krishnamurthy, Arvind</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a679-9428d0d984fb48dab4b80af70e6fd946671a3c640e010f2da3088352f43f34203</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Learning</topic><topic>Computer Science - Programming Languages</topic><toplevel>online_resources</toplevel><creatorcontrib>Chen, Tianqi</creatorcontrib><creatorcontrib>Moreau, Thierry</creatorcontrib><creatorcontrib>Jiang, Ziheng</creatorcontrib><creatorcontrib>Zheng, Lianmin</creatorcontrib><creatorcontrib>Yan, Eddie</creatorcontrib><creatorcontrib>Cowan, Meghan</creatorcontrib><creatorcontrib>Shen, Haichen</creatorcontrib><creatorcontrib>Wang, Leyuan</creatorcontrib><creatorcontrib>Hu, Yuwei</creatorcontrib><creatorcontrib>Ceze, Luis</creatorcontrib><creatorcontrib>Guestrin, Carlos</creatorcontrib><creatorcontrib>Krishnamurthy, Arvind</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Chen, Tianqi</au><au>Moreau, Thierry</au><au>Jiang, Ziheng</au><au>Zheng, Lianmin</au><au>Yan, Eddie</au><au>Cowan, Meghan</au><au>Shen, Haichen</au><au>Wang, Leyuan</au><au>Hu, Yuwei</au><au>Ceze, Luis</au><au>Guestrin, Carlos</au><au>Krishnamurthy, Arvind</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>TVM: An Automated End-to-End Optimizing Compiler for Deep Learning</atitle><date>2018-02-12</date><risdate>2018</risdate><abstract>There is an increasing need to bring machine learning to a wide diversity of hardware devices. Current frameworks rely on vendor-specific operator libraries and optimize for a narrow range of server-class GPUs. Deploying workloads to new platforms -- such as mobile phones, embedded devices, and accelerators (e.g., FPGAs, ASICs) -- requires significant manual effort. We propose TVM, a compiler that exposes graph-level and operator-level optimizations to provide performance portability to deep learning workloads across diverse hardware back-ends. TVM solves optimization challenges specific to deep learning, such as high-level operator fusion, mapping to arbitrary hardware primitives, and memory latency hiding. It also automates optimization of low-level programs to hardware characteristics by employing a novel, learning-based cost modeling method for rapid exploration of code optimizations. Experimental results show that TVM delivers performance across hardware back-ends that are competitive with state-of-the-art, hand-tuned libraries for low-power CPU, mobile GPU, and server-class GPUs. We also demonstrate TVM's ability to target new accelerator back-ends, such as the FPGA-based generic deep learning accelerator. The system is open sourced and in production use inside several major companies.</abstract><doi>10.48550/arxiv.1802.04799</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.1802.04799
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_1802_04799
source	arXiv.org
subjects	Computer Science - Artificial Intelligence Computer Science - Learning Computer Science - Programming Languages
title	TVM: An Automated End-to-End Optimizing Compiler for Deep Learning
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-02T12%3A42%3A47IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=TVM:%20An%20Automated%20End-to-End%20Optimizing%20Compiler%20for%20Deep%20Learning&rft.au=Chen,%20Tianqi&rft.date=2018-02-12&rft_id=info:doi/10.48550/arxiv.1802.04799&rft_dat=%3Carxiv_GOX%3E1802_04799%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true