TVM: An Automated End-to-End Optimizing Compiler for Deep Learning

There is an increasing need to bring machine learning to a wide diversity of hardware devices. Current frameworks rely on vendor-specific operator libraries and optimize for a narrow range of server-class GPUs. Deploying workloads to new platforms -- such as mobile phones, embedded devices, and acce...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Chen, Tianqi, Moreau, Thierry, Jiang, Ziheng, Zheng, Lianmin, Yan, Eddie, Cowan, Meghan, Shen, Haichen, Wang, Leyuan, Hu, Yuwei, Ceze, Luis, Guestrin, Carlos, Krishnamurthy, Arvind
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Chen, Tianqi
Moreau, Thierry
Jiang, Ziheng
Zheng, Lianmin
Yan, Eddie
Cowan, Meghan
Shen, Haichen
Wang, Leyuan
Hu, Yuwei
Ceze, Luis
Guestrin, Carlos
Krishnamurthy, Arvind
description There is an increasing need to bring machine learning to a wide diversity of hardware devices. Current frameworks rely on vendor-specific operator libraries and optimize for a narrow range of server-class GPUs. Deploying workloads to new platforms -- such as mobile phones, embedded devices, and accelerators (e.g., FPGAs, ASICs) -- requires significant manual effort. We propose TVM, a compiler that exposes graph-level and operator-level optimizations to provide performance portability to deep learning workloads across diverse hardware back-ends. TVM solves optimization challenges specific to deep learning, such as high-level operator fusion, mapping to arbitrary hardware primitives, and memory latency hiding. It also automates optimization of low-level programs to hardware characteristics by employing a novel, learning-based cost modeling method for rapid exploration of code optimizations. Experimental results show that TVM delivers performance across hardware back-ends that are competitive with state-of-the-art, hand-tuned libraries for low-power CPU, mobile GPU, and server-class GPUs. We also demonstrate TVM's ability to target new accelerator back-ends, such as the FPGA-based generic deep learning accelerator. The system is open sourced and in production use inside several major companies.
doi_str_mv 10.48550/arxiv.1802.04799
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_1802_04799</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1802_04799</sourcerecordid><originalsourceid>FETCH-LOGICAL-a679-9428d0d984fb48dab4b80af70e6fd946671a3c640e010f2da3088352f43f34203</originalsourceid><addsrcrecordid>eNotz71OwzAUhmEvDKj0ApjwDTg9sU8cmy1Ny48U1CVijU6wjSw1PzIBAVdPKUzv8Emf9DB2nUOGpihgQ-kzfmS5AZkBltZesm37_HTLq5FX78s00OId349OLJM4hR_mJQ7xO46vvJ6GOR594mFKfOf9zBtPaTxNV-wi0PHNr_-7Yu3dvq0fRHO4f6yrRpAurbAojQNnDYYejaMeewMUSvA6OItalzmpF43gIYcgHSkwRhUyoAoKJagVu_m7PSO6OcWB0lf3i-nOGPUDosRCkw</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>TVM: An Automated End-to-End Optimizing Compiler for Deep Learning</title><source>arXiv.org</source><creator>Chen, Tianqi ; Moreau, Thierry ; Jiang, Ziheng ; Zheng, Lianmin ; Yan, Eddie ; Cowan, Meghan ; Shen, Haichen ; Wang, Leyuan ; Hu, Yuwei ; Ceze, Luis ; Guestrin, Carlos ; Krishnamurthy, Arvind</creator><creatorcontrib>Chen, Tianqi ; Moreau, Thierry ; Jiang, Ziheng ; Zheng, Lianmin ; Yan, Eddie ; Cowan, Meghan ; Shen, Haichen ; Wang, Leyuan ; Hu, Yuwei ; Ceze, Luis ; Guestrin, Carlos ; Krishnamurthy, Arvind</creatorcontrib><description>There is an increasing need to bring machine learning to a wide diversity of hardware devices. Current frameworks rely on vendor-specific operator libraries and optimize for a narrow range of server-class GPUs. Deploying workloads to new platforms -- such as mobile phones, embedded devices, and accelerators (e.g., FPGAs, ASICs) -- requires significant manual effort. We propose TVM, a compiler that exposes graph-level and operator-level optimizations to provide performance portability to deep learning workloads across diverse hardware back-ends. TVM solves optimization challenges specific to deep learning, such as high-level operator fusion, mapping to arbitrary hardware primitives, and memory latency hiding. It also automates optimization of low-level programs to hardware characteristics by employing a novel, learning-based cost modeling method for rapid exploration of code optimizations. Experimental results show that TVM delivers performance across hardware back-ends that are competitive with state-of-the-art, hand-tuned libraries for low-power CPU, mobile GPU, and server-class GPUs. We also demonstrate TVM's ability to target new accelerator back-ends, such as the FPGA-based generic deep learning accelerator. The system is open sourced and in production use inside several major companies.</description><identifier>DOI: 10.48550/arxiv.1802.04799</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Learning ; Computer Science - Programming Languages</subject><creationdate>2018-02</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/1802.04799$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.1802.04799$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Chen, Tianqi</creatorcontrib><creatorcontrib>Moreau, Thierry</creatorcontrib><creatorcontrib>Jiang, Ziheng</creatorcontrib><creatorcontrib>Zheng, Lianmin</creatorcontrib><creatorcontrib>Yan, Eddie</creatorcontrib><creatorcontrib>Cowan, Meghan</creatorcontrib><creatorcontrib>Shen, Haichen</creatorcontrib><creatorcontrib>Wang, Leyuan</creatorcontrib><creatorcontrib>Hu, Yuwei</creatorcontrib><creatorcontrib>Ceze, Luis</creatorcontrib><creatorcontrib>Guestrin, Carlos</creatorcontrib><creatorcontrib>Krishnamurthy, Arvind</creatorcontrib><title>TVM: An Automated End-to-End Optimizing Compiler for Deep Learning</title><description>There is an increasing need to bring machine learning to a wide diversity of hardware devices. Current frameworks rely on vendor-specific operator libraries and optimize for a narrow range of server-class GPUs. Deploying workloads to new platforms -- such as mobile phones, embedded devices, and accelerators (e.g., FPGAs, ASICs) -- requires significant manual effort. We propose TVM, a compiler that exposes graph-level and operator-level optimizations to provide performance portability to deep learning workloads across diverse hardware back-ends. TVM solves optimization challenges specific to deep learning, such as high-level operator fusion, mapping to arbitrary hardware primitives, and memory latency hiding. It also automates optimization of low-level programs to hardware characteristics by employing a novel, learning-based cost modeling method for rapid exploration of code optimizations. Experimental results show that TVM delivers performance across hardware back-ends that are competitive with state-of-the-art, hand-tuned libraries for low-power CPU, mobile GPU, and server-class GPUs. We also demonstrate TVM's ability to target new accelerator back-ends, such as the FPGA-based generic deep learning accelerator. The system is open sourced and in production use inside several major companies.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Learning</subject><subject>Computer Science - Programming Languages</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotz71OwzAUhmEvDKj0ApjwDTg9sU8cmy1Ny48U1CVijU6wjSw1PzIBAVdPKUzv8Emf9DB2nUOGpihgQ-kzfmS5AZkBltZesm37_HTLq5FX78s00OId349OLJM4hR_mJQ7xO46vvJ6GOR594mFKfOf9zBtPaTxNV-wi0PHNr_-7Yu3dvq0fRHO4f6yrRpAurbAojQNnDYYejaMeewMUSvA6OItalzmpF43gIYcgHSkwRhUyoAoKJagVu_m7PSO6OcWB0lf3i-nOGPUDosRCkw</recordid><startdate>20180212</startdate><enddate>20180212</enddate><creator>Chen, Tianqi</creator><creator>Moreau, Thierry</creator><creator>Jiang, Ziheng</creator><creator>Zheng, Lianmin</creator><creator>Yan, Eddie</creator><creator>Cowan, Meghan</creator><creator>Shen, Haichen</creator><creator>Wang, Leyuan</creator><creator>Hu, Yuwei</creator><creator>Ceze, Luis</creator><creator>Guestrin, Carlos</creator><creator>Krishnamurthy, Arvind</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20180212</creationdate><title>TVM: An Automated End-to-End Optimizing Compiler for Deep Learning</title><author>Chen, Tianqi ; Moreau, Thierry ; Jiang, Ziheng ; Zheng, Lianmin ; Yan, Eddie ; Cowan, Meghan ; Shen, Haichen ; Wang, Leyuan ; Hu, Yuwei ; Ceze, Luis ; Guestrin, Carlos ; Krishnamurthy, Arvind</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a679-9428d0d984fb48dab4b80af70e6fd946671a3c640e010f2da3088352f43f34203</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Learning</topic><topic>Computer Science - Programming Languages</topic><toplevel>online_resources</toplevel><creatorcontrib>Chen, Tianqi</creatorcontrib><creatorcontrib>Moreau, Thierry</creatorcontrib><creatorcontrib>Jiang, Ziheng</creatorcontrib><creatorcontrib>Zheng, Lianmin</creatorcontrib><creatorcontrib>Yan, Eddie</creatorcontrib><creatorcontrib>Cowan, Meghan</creatorcontrib><creatorcontrib>Shen, Haichen</creatorcontrib><creatorcontrib>Wang, Leyuan</creatorcontrib><creatorcontrib>Hu, Yuwei</creatorcontrib><creatorcontrib>Ceze, Luis</creatorcontrib><creatorcontrib>Guestrin, Carlos</creatorcontrib><creatorcontrib>Krishnamurthy, Arvind</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Chen, Tianqi</au><au>Moreau, Thierry</au><au>Jiang, Ziheng</au><au>Zheng, Lianmin</au><au>Yan, Eddie</au><au>Cowan, Meghan</au><au>Shen, Haichen</au><au>Wang, Leyuan</au><au>Hu, Yuwei</au><au>Ceze, Luis</au><au>Guestrin, Carlos</au><au>Krishnamurthy, Arvind</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>TVM: An Automated End-to-End Optimizing Compiler for Deep Learning</atitle><date>2018-02-12</date><risdate>2018</risdate><abstract>There is an increasing need to bring machine learning to a wide diversity of hardware devices. Current frameworks rely on vendor-specific operator libraries and optimize for a narrow range of server-class GPUs. Deploying workloads to new platforms -- such as mobile phones, embedded devices, and accelerators (e.g., FPGAs, ASICs) -- requires significant manual effort. We propose TVM, a compiler that exposes graph-level and operator-level optimizations to provide performance portability to deep learning workloads across diverse hardware back-ends. TVM solves optimization challenges specific to deep learning, such as high-level operator fusion, mapping to arbitrary hardware primitives, and memory latency hiding. It also automates optimization of low-level programs to hardware characteristics by employing a novel, learning-based cost modeling method for rapid exploration of code optimizations. Experimental results show that TVM delivers performance across hardware back-ends that are competitive with state-of-the-art, hand-tuned libraries for low-power CPU, mobile GPU, and server-class GPUs. We also demonstrate TVM's ability to target new accelerator back-ends, such as the FPGA-based generic deep learning accelerator. The system is open sourced and in production use inside several major companies.</abstract><doi>10.48550/arxiv.1802.04799</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.1802.04799
ispartof
issn
language eng
recordid cdi_arxiv_primary_1802_04799
source arXiv.org
subjects Computer Science - Artificial Intelligence
Computer Science - Learning
Computer Science - Programming Languages
title TVM: An Automated End-to-End Optimizing Compiler for Deep Learning
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-02T12%3A42%3A47IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=TVM:%20An%20Automated%20End-to-End%20Optimizing%20Compiler%20for%20Deep%20Learning&rft.au=Chen,%20Tianqi&rft.date=2018-02-12&rft_id=info:doi/10.48550/arxiv.1802.04799&rft_dat=%3Carxiv_GOX%3E1802_04799%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true