PipeOrgan: Efficient Inter-operation Pipelining with Flexible Spatial Organization and Interconnects
Because of the recent trends in Deep Neural Networks (DNN) models being memory-bound, inter-operator pipelining for DNN accelerators is emerging as a promising optimization. Inter-operator pipelining reduces costly on-chip global memory and off-chip memory accesses by forwarding the output of a laye...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Garg, Raveesh Kwon, Hyoukjun Qin, Eric Chen, Yu-Hsin Krishna, Tushar Lai, Liangzhen |
description | Because of the recent trends in Deep Neural Networks (DNN) models being
memory-bound, inter-operator pipelining for DNN accelerators is emerging as a
promising optimization. Inter-operator pipelining reduces costly on-chip global
memory and off-chip memory accesses by forwarding the output of a layer as the
input of the next layer within the compute array, which is proven to be an
effective optimization by previous works.
However, the design space of inter-operator pipelining is huge, and the space
is not yet fully explored. In particular, identifying the right depth and
granularity of pipelining (or no pipelining at all) is significantly dependent
on the layer shapes and data volumes of weights and activations, and these are
different even within a domain.
Moreover, works divide the substrate into large chunks and map one layer onto
each chunk, which requires communicating halfway through or through the global
buffer. However, for fine-grained inter-operation pipelining, placing the
corresponding consumer of the next layer tile close to the producer tile of the
current layer is a better way to exploit fine-grained spatial reuse.
In order to support variable number of layers (ie the right depth) and
support multiple spatial organizations of layers (in accordance with the
pipelining granularity) on the substrate, we propose PipeOrgan, a new class of
spatial data organization strategy for energy efficient and congestion-free
communication between the PEs for various pipeline depth and granularity.
PipeOrgan takes advantage of flexible spatial organization and can allocate
layers to PEs based on the granularity of pipelining. We also propose changes
to the conventional mesh topology to improve the performance of coarse-grained
allocation. PipeOrgan achieves 1.95x performance improvement over the
state-of-the-art pipelined dataflow on XR-bench workloads. |
doi_str_mv | 10.48550/arxiv.2405.01736 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2405_01736</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2405_01736</sourcerecordid><originalsourceid>FETCH-LOGICAL-a676-e272824dfb9289ce07ad18202e24e3b5b2ad8954ef9b820fbd8463c49b08c1cc3</originalsourceid><addsrcrecordid>eNotj11LwzAYRnPjhUx_gFfmD7SmSdqm3snYdDCY4O5LPt7MF2JasqDTX-_W7uqBh8OBQ8hDxUqp6po96XTC75JLVpesakVzS9w7jrBLBx2f6cp7tAgx003MkIphhKQzDpFeoIAR44H-YP6k6wAnNAHox3gGdKCTAf9mWkc3G-wQI9h8vCM3Xocj3F93Qfbr1X75Vmx3r5vly7bQTdsUwFuuuHTedFx1FlirXaU448AlCFMbrp3qagm-M-fbG6dkI6zsDFO2slYsyOOsnTL7MeGXTr_9JbefcsU_zi9R0w</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>PipeOrgan: Efficient Inter-operation Pipelining with Flexible Spatial Organization and Interconnects</title><source>arXiv.org</source><creator>Garg, Raveesh ; Kwon, Hyoukjun ; Qin, Eric ; Chen, Yu-Hsin ; Krishna, Tushar ; Lai, Liangzhen</creator><creatorcontrib>Garg, Raveesh ; Kwon, Hyoukjun ; Qin, Eric ; Chen, Yu-Hsin ; Krishna, Tushar ; Lai, Liangzhen</creatorcontrib><description>Because of the recent trends in Deep Neural Networks (DNN) models being
memory-bound, inter-operator pipelining for DNN accelerators is emerging as a
promising optimization. Inter-operator pipelining reduces costly on-chip global
memory and off-chip memory accesses by forwarding the output of a layer as the
input of the next layer within the compute array, which is proven to be an
effective optimization by previous works.
However, the design space of inter-operator pipelining is huge, and the space
is not yet fully explored. In particular, identifying the right depth and
granularity of pipelining (or no pipelining at all) is significantly dependent
on the layer shapes and data volumes of weights and activations, and these are
different even within a domain.
Moreover, works divide the substrate into large chunks and map one layer onto
each chunk, which requires communicating halfway through or through the global
buffer. However, for fine-grained inter-operation pipelining, placing the
corresponding consumer of the next layer tile close to the producer tile of the
current layer is a better way to exploit fine-grained spatial reuse.
In order to support variable number of layers (ie the right depth) and
support multiple spatial organizations of layers (in accordance with the
pipelining granularity) on the substrate, we propose PipeOrgan, a new class of
spatial data organization strategy for energy efficient and congestion-free
communication between the PEs for various pipeline depth and granularity.
PipeOrgan takes advantage of flexible spatial organization and can allocate
layers to PEs based on the granularity of pipelining. We also propose changes
to the conventional mesh topology to improve the performance of coarse-grained
allocation. PipeOrgan achieves 1.95x performance improvement over the
state-of-the-art pipelined dataflow on XR-bench workloads.</description><identifier>DOI: 10.48550/arxiv.2405.01736</identifier><language>eng</language><subject>Computer Science - Hardware Architecture</subject><creationdate>2024-05</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2405.01736$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2405.01736$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Garg, Raveesh</creatorcontrib><creatorcontrib>Kwon, Hyoukjun</creatorcontrib><creatorcontrib>Qin, Eric</creatorcontrib><creatorcontrib>Chen, Yu-Hsin</creatorcontrib><creatorcontrib>Krishna, Tushar</creatorcontrib><creatorcontrib>Lai, Liangzhen</creatorcontrib><title>PipeOrgan: Efficient Inter-operation Pipelining with Flexible Spatial Organization and Interconnects</title><description>Because of the recent trends in Deep Neural Networks (DNN) models being
memory-bound, inter-operator pipelining for DNN accelerators is emerging as a
promising optimization. Inter-operator pipelining reduces costly on-chip global
memory and off-chip memory accesses by forwarding the output of a layer as the
input of the next layer within the compute array, which is proven to be an
effective optimization by previous works.
However, the design space of inter-operator pipelining is huge, and the space
is not yet fully explored. In particular, identifying the right depth and
granularity of pipelining (or no pipelining at all) is significantly dependent
on the layer shapes and data volumes of weights and activations, and these are
different even within a domain.
Moreover, works divide the substrate into large chunks and map one layer onto
each chunk, which requires communicating halfway through or through the global
buffer. However, for fine-grained inter-operation pipelining, placing the
corresponding consumer of the next layer tile close to the producer tile of the
current layer is a better way to exploit fine-grained spatial reuse.
In order to support variable number of layers (ie the right depth) and
support multiple spatial organizations of layers (in accordance with the
pipelining granularity) on the substrate, we propose PipeOrgan, a new class of
spatial data organization strategy for energy efficient and congestion-free
communication between the PEs for various pipeline depth and granularity.
PipeOrgan takes advantage of flexible spatial organization and can allocate
layers to PEs based on the granularity of pipelining. We also propose changes
to the conventional mesh topology to improve the performance of coarse-grained
allocation. PipeOrgan achieves 1.95x performance improvement over the
state-of-the-art pipelined dataflow on XR-bench workloads.</description><subject>Computer Science - Hardware Architecture</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj11LwzAYRnPjhUx_gFfmD7SmSdqm3snYdDCY4O5LPt7MF2JasqDTX-_W7uqBh8OBQ8hDxUqp6po96XTC75JLVpesakVzS9w7jrBLBx2f6cp7tAgx003MkIphhKQzDpFeoIAR44H-YP6k6wAnNAHox3gGdKCTAf9mWkc3G-wQI9h8vCM3Xocj3F93Qfbr1X75Vmx3r5vly7bQTdsUwFuuuHTedFx1FlirXaU448AlCFMbrp3qagm-M-fbG6dkI6zsDFO2slYsyOOsnTL7MeGXTr_9JbefcsU_zi9R0w</recordid><startdate>20240502</startdate><enddate>20240502</enddate><creator>Garg, Raveesh</creator><creator>Kwon, Hyoukjun</creator><creator>Qin, Eric</creator><creator>Chen, Yu-Hsin</creator><creator>Krishna, Tushar</creator><creator>Lai, Liangzhen</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240502</creationdate><title>PipeOrgan: Efficient Inter-operation Pipelining with Flexible Spatial Organization and Interconnects</title><author>Garg, Raveesh ; Kwon, Hyoukjun ; Qin, Eric ; Chen, Yu-Hsin ; Krishna, Tushar ; Lai, Liangzhen</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a676-e272824dfb9289ce07ad18202e24e3b5b2ad8954ef9b820fbd8463c49b08c1cc3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Hardware Architecture</topic><toplevel>online_resources</toplevel><creatorcontrib>Garg, Raveesh</creatorcontrib><creatorcontrib>Kwon, Hyoukjun</creatorcontrib><creatorcontrib>Qin, Eric</creatorcontrib><creatorcontrib>Chen, Yu-Hsin</creatorcontrib><creatorcontrib>Krishna, Tushar</creatorcontrib><creatorcontrib>Lai, Liangzhen</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Garg, Raveesh</au><au>Kwon, Hyoukjun</au><au>Qin, Eric</au><au>Chen, Yu-Hsin</au><au>Krishna, Tushar</au><au>Lai, Liangzhen</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>PipeOrgan: Efficient Inter-operation Pipelining with Flexible Spatial Organization and Interconnects</atitle><date>2024-05-02</date><risdate>2024</risdate><abstract>Because of the recent trends in Deep Neural Networks (DNN) models being
memory-bound, inter-operator pipelining for DNN accelerators is emerging as a
promising optimization. Inter-operator pipelining reduces costly on-chip global
memory and off-chip memory accesses by forwarding the output of a layer as the
input of the next layer within the compute array, which is proven to be an
effective optimization by previous works.
However, the design space of inter-operator pipelining is huge, and the space
is not yet fully explored. In particular, identifying the right depth and
granularity of pipelining (or no pipelining at all) is significantly dependent
on the layer shapes and data volumes of weights and activations, and these are
different even within a domain.
Moreover, works divide the substrate into large chunks and map one layer onto
each chunk, which requires communicating halfway through or through the global
buffer. However, for fine-grained inter-operation pipelining, placing the
corresponding consumer of the next layer tile close to the producer tile of the
current layer is a better way to exploit fine-grained spatial reuse.
In order to support variable number of layers (ie the right depth) and
support multiple spatial organizations of layers (in accordance with the
pipelining granularity) on the substrate, we propose PipeOrgan, a new class of
spatial data organization strategy for energy efficient and congestion-free
communication between the PEs for various pipeline depth and granularity.
PipeOrgan takes advantage of flexible spatial organization and can allocate
layers to PEs based on the granularity of pipelining. We also propose changes
to the conventional mesh topology to improve the performance of coarse-grained
allocation. PipeOrgan achieves 1.95x performance improvement over the
state-of-the-art pipelined dataflow on XR-bench workloads.</abstract><doi>10.48550/arxiv.2405.01736</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2405.01736 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2405_01736 |
source | arXiv.org |
subjects | Computer Science - Hardware Architecture |
title | PipeOrgan: Efficient Inter-operation Pipelining with Flexible Spatial Organization and Interconnects |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-25T10%3A57%3A30IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=PipeOrgan:%20Efficient%20Inter-operation%20Pipelining%20with%20Flexible%20Spatial%20Organization%20and%20Interconnects&rft.au=Garg,%20Raveesh&rft.date=2024-05-02&rft_id=info:doi/10.48550/arxiv.2405.01736&rft_dat=%3Carxiv_GOX%3E2405_01736%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |