PipeOrgan: Efficient Inter-operation Pipelining with Flexible Spatial Organization and Interconnects

Because of the recent trends in Deep Neural Networks (DNN) models being memory-bound, inter-operator pipelining for DNN accelerators is emerging as a promising optimization. Inter-operator pipelining reduces costly on-chip global memory and off-chip memory accesses by forwarding the output of a laye...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:arXiv.org 2024-05
Hauptverfasser: Garg, Raveesh, Kwon, Hyoukjun, Qin, Eric, Yu-Hsin, Chen, Krishna, Tushar, Lai, Liangzhen
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Garg, Raveesh
Kwon, Hyoukjun
Qin, Eric
Yu-Hsin, Chen
Krishna, Tushar
Lai, Liangzhen
description Because of the recent trends in Deep Neural Networks (DNN) models being memory-bound, inter-operator pipelining for DNN accelerators is emerging as a promising optimization. Inter-operator pipelining reduces costly on-chip global memory and off-chip memory accesses by forwarding the output of a layer as the input of the next layer within the compute array, which is proven to be an effective optimization by previous works. However, the design space of inter-operator pipelining is huge, and the space is not yet fully explored. In particular, identifying the right depth and granularity of pipelining (or no pipelining at all) is significantly dependent on the layer shapes and data volumes of weights and activations, and these are different even within a domain. Moreover, works divide the substrate into large chunks and map one layer onto each chunk, which requires communicating halfway through or through the global buffer. However, for fine-grained inter-operation pipelining, placing the corresponding consumer of the next layer tile close to the producer tile of the current layer is a better way to exploit fine-grained spatial reuse. In order to support variable number of layers (ie the right depth) and support multiple spatial organizations of layers (in accordance with the pipelining granularity) on the substrate, we propose PipeOrgan, a new class of spatial data organization strategy for energy efficient and congestion-free communication between the PEs for various pipeline depth and granularity. PipeOrgan takes advantage of flexible spatial organization and can allocate layers to PEs based on the granularity of pipelining. We also propose changes to the conventional mesh topology to improve the performance of coarse-grained allocation. PipeOrgan achieves 1.95x performance improvement over the state-of-the-art pipelined dataflow on XR-bench workloads.
format Article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_3051503663</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3051503663</sourcerecordid><originalsourceid>FETCH-proquest_journals_30515036633</originalsourceid><addsrcrecordid>eNqNzEEKwjAQheEgCIr2DgHXhTSxVdyKRVcKui-xnepImNQkRfH0VusBXL3F-_gHbCyVSuLlXMoRi7y_CSFktpBpqsasOmADe3fRtOKbusYSgQLfUQAX2wacDmiJf5BBQrrwB4Yrzw088WyAH5sOaMO_BXz1WlPVF0pLBGXwUzastfEQ_XbCZvnmtN7GjbP3FnwobrZ11F2FEmmSCpVlSv2n3locRuA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3051503663</pqid></control><display><type>article</type><title>PipeOrgan: Efficient Inter-operation Pipelining with Flexible Spatial Organization and Interconnects</title><source>Free E- Journals</source><creator>Garg, Raveesh ; Kwon, Hyoukjun ; Qin, Eric ; Yu-Hsin, Chen ; Krishna, Tushar ; Lai, Liangzhen</creator><creatorcontrib>Garg, Raveesh ; Kwon, Hyoukjun ; Qin, Eric ; Yu-Hsin, Chen ; Krishna, Tushar ; Lai, Liangzhen</creatorcontrib><description>Because of the recent trends in Deep Neural Networks (DNN) models being memory-bound, inter-operator pipelining for DNN accelerators is emerging as a promising optimization. Inter-operator pipelining reduces costly on-chip global memory and off-chip memory accesses by forwarding the output of a layer as the input of the next layer within the compute array, which is proven to be an effective optimization by previous works. However, the design space of inter-operator pipelining is huge, and the space is not yet fully explored. In particular, identifying the right depth and granularity of pipelining (or no pipelining at all) is significantly dependent on the layer shapes and data volumes of weights and activations, and these are different even within a domain. Moreover, works divide the substrate into large chunks and map one layer onto each chunk, which requires communicating halfway through or through the global buffer. However, for fine-grained inter-operation pipelining, placing the corresponding consumer of the next layer tile close to the producer tile of the current layer is a better way to exploit fine-grained spatial reuse. In order to support variable number of layers (ie the right depth) and support multiple spatial organizations of layers (in accordance with the pipelining granularity) on the substrate, we propose PipeOrgan, a new class of spatial data organization strategy for energy efficient and congestion-free communication between the PEs for various pipeline depth and granularity. PipeOrgan takes advantage of flexible spatial organization and can allocate layers to PEs based on the granularity of pipelining. We also propose changes to the conventional mesh topology to improve the performance of coarse-grained allocation. PipeOrgan achieves 1.95x performance improvement over the state-of-the-art pipelined dataflow on XR-bench workloads.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Artificial neural networks ; Chips (memory devices) ; Communication ; Finite element method ; Optimization ; Performance enhancement ; Spatial data ; Substrates ; Topology</subject><ispartof>arXiv.org, 2024-05</ispartof><rights>2024. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>780,784</link.rule.ids></links><search><creatorcontrib>Garg, Raveesh</creatorcontrib><creatorcontrib>Kwon, Hyoukjun</creatorcontrib><creatorcontrib>Qin, Eric</creatorcontrib><creatorcontrib>Yu-Hsin, Chen</creatorcontrib><creatorcontrib>Krishna, Tushar</creatorcontrib><creatorcontrib>Lai, Liangzhen</creatorcontrib><title>PipeOrgan: Efficient Inter-operation Pipelining with Flexible Spatial Organization and Interconnects</title><title>arXiv.org</title><description>Because of the recent trends in Deep Neural Networks (DNN) models being memory-bound, inter-operator pipelining for DNN accelerators is emerging as a promising optimization. Inter-operator pipelining reduces costly on-chip global memory and off-chip memory accesses by forwarding the output of a layer as the input of the next layer within the compute array, which is proven to be an effective optimization by previous works. However, the design space of inter-operator pipelining is huge, and the space is not yet fully explored. In particular, identifying the right depth and granularity of pipelining (or no pipelining at all) is significantly dependent on the layer shapes and data volumes of weights and activations, and these are different even within a domain. Moreover, works divide the substrate into large chunks and map one layer onto each chunk, which requires communicating halfway through or through the global buffer. However, for fine-grained inter-operation pipelining, placing the corresponding consumer of the next layer tile close to the producer tile of the current layer is a better way to exploit fine-grained spatial reuse. In order to support variable number of layers (ie the right depth) and support multiple spatial organizations of layers (in accordance with the pipelining granularity) on the substrate, we propose PipeOrgan, a new class of spatial data organization strategy for energy efficient and congestion-free communication between the PEs for various pipeline depth and granularity. PipeOrgan takes advantage of flexible spatial organization and can allocate layers to PEs based on the granularity of pipelining. We also propose changes to the conventional mesh topology to improve the performance of coarse-grained allocation. PipeOrgan achieves 1.95x performance improvement over the state-of-the-art pipelined dataflow on XR-bench workloads.</description><subject>Artificial neural networks</subject><subject>Chips (memory devices)</subject><subject>Communication</subject><subject>Finite element method</subject><subject>Optimization</subject><subject>Performance enhancement</subject><subject>Spatial data</subject><subject>Substrates</subject><subject>Topology</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNqNzEEKwjAQheEgCIr2DgHXhTSxVdyKRVcKui-xnepImNQkRfH0VusBXL3F-_gHbCyVSuLlXMoRi7y_CSFktpBpqsasOmADe3fRtOKbusYSgQLfUQAX2wacDmiJf5BBQrrwB4Yrzw088WyAH5sOaMO_BXz1WlPVF0pLBGXwUzastfEQ_XbCZvnmtN7GjbP3FnwobrZ11F2FEmmSCpVlSv2n3locRuA</recordid><startdate>20240502</startdate><enddate>20240502</enddate><creator>Garg, Raveesh</creator><creator>Kwon, Hyoukjun</creator><creator>Qin, Eric</creator><creator>Yu-Hsin, Chen</creator><creator>Krishna, Tushar</creator><creator>Lai, Liangzhen</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20240502</creationdate><title>PipeOrgan: Efficient Inter-operation Pipelining with Flexible Spatial Organization and Interconnects</title><author>Garg, Raveesh ; Kwon, Hyoukjun ; Qin, Eric ; Yu-Hsin, Chen ; Krishna, Tushar ; Lai, Liangzhen</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_30515036633</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Artificial neural networks</topic><topic>Chips (memory devices)</topic><topic>Communication</topic><topic>Finite element method</topic><topic>Optimization</topic><topic>Performance enhancement</topic><topic>Spatial data</topic><topic>Substrates</topic><topic>Topology</topic><toplevel>online_resources</toplevel><creatorcontrib>Garg, Raveesh</creatorcontrib><creatorcontrib>Kwon, Hyoukjun</creatorcontrib><creatorcontrib>Qin, Eric</creatorcontrib><creatorcontrib>Yu-Hsin, Chen</creatorcontrib><creatorcontrib>Krishna, Tushar</creatorcontrib><creatorcontrib>Lai, Liangzhen</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Garg, Raveesh</au><au>Kwon, Hyoukjun</au><au>Qin, Eric</au><au>Yu-Hsin, Chen</au><au>Krishna, Tushar</au><au>Lai, Liangzhen</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>PipeOrgan: Efficient Inter-operation Pipelining with Flexible Spatial Organization and Interconnects</atitle><jtitle>arXiv.org</jtitle><date>2024-05-02</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>Because of the recent trends in Deep Neural Networks (DNN) models being memory-bound, inter-operator pipelining for DNN accelerators is emerging as a promising optimization. Inter-operator pipelining reduces costly on-chip global memory and off-chip memory accesses by forwarding the output of a layer as the input of the next layer within the compute array, which is proven to be an effective optimization by previous works. However, the design space of inter-operator pipelining is huge, and the space is not yet fully explored. In particular, identifying the right depth and granularity of pipelining (or no pipelining at all) is significantly dependent on the layer shapes and data volumes of weights and activations, and these are different even within a domain. Moreover, works divide the substrate into large chunks and map one layer onto each chunk, which requires communicating halfway through or through the global buffer. However, for fine-grained inter-operation pipelining, placing the corresponding consumer of the next layer tile close to the producer tile of the current layer is a better way to exploit fine-grained spatial reuse. In order to support variable number of layers (ie the right depth) and support multiple spatial organizations of layers (in accordance with the pipelining granularity) on the substrate, we propose PipeOrgan, a new class of spatial data organization strategy for energy efficient and congestion-free communication between the PEs for various pipeline depth and granularity. PipeOrgan takes advantage of flexible spatial organization and can allocate layers to PEs based on the granularity of pipelining. We also propose changes to the conventional mesh topology to improve the performance of coarse-grained allocation. PipeOrgan achieves 1.95x performance improvement over the state-of-the-art pipelined dataflow on XR-bench workloads.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2024-05
issn 2331-8422
language eng
recordid cdi_proquest_journals_3051503663
source Free E- Journals
subjects Artificial neural networks
Chips (memory devices)
Communication
Finite element method
Optimization
Performance enhancement
Spatial data
Substrates
Topology
title PipeOrgan: Efficient Inter-operation Pipelining with Flexible Spatial Organization and Interconnects
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-12T20%3A13%3A08IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=PipeOrgan:%20Efficient%20Inter-operation%20Pipelining%20with%20Flexible%20Spatial%20Organization%20and%20Interconnects&rft.jtitle=arXiv.org&rft.au=Garg,%20Raveesh&rft.date=2024-05-02&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E3051503663%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3051503663&rft_id=info:pmid/&rfr_iscdi=true