FAMS: A FrAmework of Memory-Centric Mapping for DNNs on Systolic Array Accelerators

In recent years, deep neural networks (DNNs) have experienced rapid development. These DNNs demonstrate significant variations in architecture and scale, creating a substantial demand for domain-specific accelerators that are optimized for both high performance and low energy consumption. Systolic a...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on very large scale integration (VLSI) systems 2025-01, p.1-14
Hauptverfasser:	Sun, Hao, Shen, Junzhong, Zhang, Tian, Tang, Zhongyi, Zhang, Changwu, Li, Yuhang, Shi, Yang, Liu, Hengzhu
Format:	Artikel
Sprache:	eng
Schlagworte:	Arrays Artificial neural networks Computer architecture Dataflow deep neural network (DNN) accelerator Delays design space exploration Energy consumption Hardware mapping space exploration Space exploration System-on-chip systolic array Systolic arrays Tensors
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	14
container_issue
container_start_page	1
container_title	IEEE transactions on very large scale integration (VLSI) systems
container_volume
creator	Sun, Hao Shen, Junzhong Zhang, Tian Tang, Zhongyi Zhang, Changwu Li, Yuhang Shi, Yang Liu, Hengzhu
description	In recent years, deep neural networks (DNNs) have experienced rapid development. These DNNs demonstrate significant variations in architecture and scale, creating a substantial demand for domain-specific accelerators that are optimized for both high performance and low energy consumption. Systolic array accelerators, due to their efficient dataflow and parallel processing capabilities, offer significant advantages when performing computations for DNNs. Existing studies frequently overlook various hardware constraints in systolic array accelerators when representing mapping strategies. This oversight includes ignoring the differences in delays between communication and computation operations, as well as overlooking the capacities of multilevel memory hierarchies. Such omissions can lead to inaccuracies in predicting accelerator performance and inefficiencies in system design. We propose the FAMS framework, which introduces a memory-centric notation capable of fully representing the mapping of DNN operations on systolic array accelerators. Memory-centric notation moves away from the idealized assumptions of previous notations and considers various hardware constraints, thereby expanding the effective design and mapping spaces. The FAMS framework also includes a cycle-accurate simulator, which takes the hardware configurations, task descriptions, and mapping strategy represented by memory-centric notation as inputs, providing various metrics such as latency and energy consumption. The experimental results demonstrate that our proposed FAMS framework reduces latency by up to 29.7% and increases throughput by 42.4% compared to the state-of-the-art TENET framework. Additionally, under hardware configurations with a MAC delay of 2 and 3 clock cycles, the FAMS framework enhances performance by 12.0% and 25.4%, respectively.
doi_str_mv	10.1109/TVLSI.2024.3522326
format	Article
fullrecord	<record><control><sourceid>crossref_RIE</sourceid><recordid>TN_cdi_ieee_primary_10843963</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10843963</ieee_id><sourcerecordid>10_1109_TVLSI_2024_3522326</sourcerecordid><originalsourceid>FETCH-LOGICAL-c149t-341d3844aa518bacb178fa2f90e44c26f33b981bde17a28eee6115f8df239dfb3</originalsourceid><addsrcrecordid>eNpNkEFOwzAQRS0EEqVwAcTCF0jx2E5qs4sKhUppWaSwjRzHRoG2rsaRUG7flHbB38yXRu8vHiH3wCYATD-uP4tyMeGMy4lIORc8uyAjSNNpoodcDp1lIlEc2DW5ifGbMZBSsxEp5_myfKI5nWO-db8Bf2jwdOm2Aftk5nYdtpYuzX7f7r6oD0ifV6tIw46WfezCZnjmiKanubVu49B0AeMtufJmE93d-Y7Jx_xlPXtLivfXxSwvEgtSd4mQ0AglpTEpqNrYGqbKG-41c1Jannkhaq2gbhxMDVfOuQwg9arxXOjG12JM-GnXYogRna_22G4N9hWw6qil-tNSHbVUZy0D9HCC2mHwH6Ck0JkQBzbzXqs</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>FAMS: A FrAmework of Memory-Centric Mapping for DNNs on Systolic Array Accelerators</title><source>IEEE Electronic Library (IEL)</source><creator>Sun, Hao ; Shen, Junzhong ; Zhang, Tian ; Tang, Zhongyi ; Zhang, Changwu ; Li, Yuhang ; Shi, Yang ; Liu, Hengzhu</creator><creatorcontrib>Sun, Hao ; Shen, Junzhong ; Zhang, Tian ; Tang, Zhongyi ; Zhang, Changwu ; Li, Yuhang ; Shi, Yang ; Liu, Hengzhu</creatorcontrib><description>In recent years, deep neural networks (DNNs) have experienced rapid development. These DNNs demonstrate significant variations in architecture and scale, creating a substantial demand for domain-specific accelerators that are optimized for both high performance and low energy consumption. Systolic array accelerators, due to their efficient dataflow and parallel processing capabilities, offer significant advantages when performing computations for DNNs. Existing studies frequently overlook various hardware constraints in systolic array accelerators when representing mapping strategies. This oversight includes ignoring the differences in delays between communication and computation operations, as well as overlooking the capacities of multilevel memory hierarchies. Such omissions can lead to inaccuracies in predicting accelerator performance and inefficiencies in system design. We propose the FAMS framework, which introduces a memory-centric notation capable of fully representing the mapping of DNN operations on systolic array accelerators. Memory-centric notation moves away from the idealized assumptions of previous notations and considers various hardware constraints, thereby expanding the effective design and mapping spaces. The FAMS framework also includes a cycle-accurate simulator, which takes the hardware configurations, task descriptions, and mapping strategy represented by memory-centric notation as inputs, providing various metrics such as latency and energy consumption. The experimental results demonstrate that our proposed FAMS framework reduces latency by up to 29.7% and increases throughput by 42.4% compared to the state-of-the-art TENET framework. Additionally, under hardware configurations with a MAC delay of 2 and 3 clock cycles, the FAMS framework enhances performance by 12.0% and 25.4%, respectively.</description><identifier>ISSN: 1063-8210</identifier><identifier>EISSN: 1557-9999</identifier><identifier>DOI: 10.1109/TVLSI.2024.3522326</identifier><identifier>CODEN: IEVSE9</identifier><language>eng</language><publisher>IEEE</publisher><subject>Arrays ; Artificial neural networks ; Computer architecture ; Dataflow ; deep neural network (DNN) accelerator ; Delays ; design space exploration ; Energy consumption ; Hardware ; mapping space exploration ; Space exploration ; System-on-chip ; systolic array ; Systolic arrays ; Tensors</subject><ispartof>IEEE transactions on very large scale integration (VLSI) systems, 2025-01, p.1-14</ispartof><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><orcidid>0000-0001-5786-3171 ; 0000-0002-2938-2714 ; 0000-0002-2918-8647 ; 0000-0001-6233-6800 ; 0000-0003-4912-0364</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10843963$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10843963$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Sun, Hao</creatorcontrib><creatorcontrib>Shen, Junzhong</creatorcontrib><creatorcontrib>Zhang, Tian</creatorcontrib><creatorcontrib>Tang, Zhongyi</creatorcontrib><creatorcontrib>Zhang, Changwu</creatorcontrib><creatorcontrib>Li, Yuhang</creatorcontrib><creatorcontrib>Shi, Yang</creatorcontrib><creatorcontrib>Liu, Hengzhu</creatorcontrib><title>FAMS: A FrAmework of Memory-Centric Mapping for DNNs on Systolic Array Accelerators</title><title>IEEE transactions on very large scale integration (VLSI) systems</title><addtitle>TVLSI</addtitle><description>In recent years, deep neural networks (DNNs) have experienced rapid development. These DNNs demonstrate significant variations in architecture and scale, creating a substantial demand for domain-specific accelerators that are optimized for both high performance and low energy consumption. Systolic array accelerators, due to their efficient dataflow and parallel processing capabilities, offer significant advantages when performing computations for DNNs. Existing studies frequently overlook various hardware constraints in systolic array accelerators when representing mapping strategies. This oversight includes ignoring the differences in delays between communication and computation operations, as well as overlooking the capacities of multilevel memory hierarchies. Such omissions can lead to inaccuracies in predicting accelerator performance and inefficiencies in system design. We propose the FAMS framework, which introduces a memory-centric notation capable of fully representing the mapping of DNN operations on systolic array accelerators. Memory-centric notation moves away from the idealized assumptions of previous notations and considers various hardware constraints, thereby expanding the effective design and mapping spaces. The FAMS framework also includes a cycle-accurate simulator, which takes the hardware configurations, task descriptions, and mapping strategy represented by memory-centric notation as inputs, providing various metrics such as latency and energy consumption. The experimental results demonstrate that our proposed FAMS framework reduces latency by up to 29.7% and increases throughput by 42.4% compared to the state-of-the-art TENET framework. Additionally, under hardware configurations with a MAC delay of 2 and 3 clock cycles, the FAMS framework enhances performance by 12.0% and 25.4%, respectively.</description><subject>Arrays</subject><subject>Artificial neural networks</subject><subject>Computer architecture</subject><subject>Dataflow</subject><subject>deep neural network (DNN) accelerator</subject><subject>Delays</subject><subject>design space exploration</subject><subject>Energy consumption</subject><subject>Hardware</subject><subject>mapping space exploration</subject><subject>Space exploration</subject><subject>System-on-chip</subject><subject>systolic array</subject><subject>Systolic arrays</subject><subject>Tensors</subject><issn>1063-8210</issn><issn>1557-9999</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2025</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNkEFOwzAQRS0EEqVwAcTCF0jx2E5qs4sKhUppWaSwjRzHRoG2rsaRUG7flHbB38yXRu8vHiH3wCYATD-uP4tyMeGMy4lIORc8uyAjSNNpoodcDp1lIlEc2DW5ifGbMZBSsxEp5_myfKI5nWO-db8Bf2jwdOm2Aftk5nYdtpYuzX7f7r6oD0ifV6tIw46WfezCZnjmiKanubVu49B0AeMtufJmE93d-Y7Jx_xlPXtLivfXxSwvEgtSd4mQ0AglpTEpqNrYGqbKG-41c1Jannkhaq2gbhxMDVfOuQwg9arxXOjG12JM-GnXYogRna_22G4N9hWw6qil-tNSHbVUZy0D9HCC2mHwH6Ck0JkQBzbzXqs</recordid><startdate>20250116</startdate><enddate>20250116</enddate><creator>Sun, Hao</creator><creator>Shen, Junzhong</creator><creator>Zhang, Tian</creator><creator>Tang, Zhongyi</creator><creator>Zhang, Changwu</creator><creator>Li, Yuhang</creator><creator>Shi, Yang</creator><creator>Liu, Hengzhu</creator><general>IEEE</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0001-5786-3171</orcidid><orcidid>https://orcid.org/0000-0002-2938-2714</orcidid><orcidid>https://orcid.org/0000-0002-2918-8647</orcidid><orcidid>https://orcid.org/0000-0001-6233-6800</orcidid><orcidid>https://orcid.org/0000-0003-4912-0364</orcidid></search><sort><creationdate>20250116</creationdate><title>FAMS: A FrAmework of Memory-Centric Mapping for DNNs on Systolic Array Accelerators</title><author>Sun, Hao ; Shen, Junzhong ; Zhang, Tian ; Tang, Zhongyi ; Zhang, Changwu ; Li, Yuhang ; Shi, Yang ; Liu, Hengzhu</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c149t-341d3844aa518bacb178fa2f90e44c26f33b981bde17a28eee6115f8df239dfb3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2025</creationdate><topic>Arrays</topic><topic>Artificial neural networks</topic><topic>Computer architecture</topic><topic>Dataflow</topic><topic>deep neural network (DNN) accelerator</topic><topic>Delays</topic><topic>design space exploration</topic><topic>Energy consumption</topic><topic>Hardware</topic><topic>mapping space exploration</topic><topic>Space exploration</topic><topic>System-on-chip</topic><topic>systolic array</topic><topic>Systolic arrays</topic><topic>Tensors</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Sun, Hao</creatorcontrib><creatorcontrib>Shen, Junzhong</creatorcontrib><creatorcontrib>Zhang, Tian</creatorcontrib><creatorcontrib>Tang, Zhongyi</creatorcontrib><creatorcontrib>Zhang, Changwu</creatorcontrib><creatorcontrib>Li, Yuhang</creatorcontrib><creatorcontrib>Shi, Yang</creatorcontrib><creatorcontrib>Liu, Hengzhu</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><jtitle>IEEE transactions on very large scale integration (VLSI) systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Sun, Hao</au><au>Shen, Junzhong</au><au>Zhang, Tian</au><au>Tang, Zhongyi</au><au>Zhang, Changwu</au><au>Li, Yuhang</au><au>Shi, Yang</au><au>Liu, Hengzhu</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>FAMS: A FrAmework of Memory-Centric Mapping for DNNs on Systolic Array Accelerators</atitle><jtitle>IEEE transactions on very large scale integration (VLSI) systems</jtitle><stitle>TVLSI</stitle><date>2025-01-16</date><risdate>2025</risdate><spage>1</spage><epage>14</epage><pages>1-14</pages><issn>1063-8210</issn><eissn>1557-9999</eissn><coden>IEVSE9</coden><abstract>In recent years, deep neural networks (DNNs) have experienced rapid development. These DNNs demonstrate significant variations in architecture and scale, creating a substantial demand for domain-specific accelerators that are optimized for both high performance and low energy consumption. Systolic array accelerators, due to their efficient dataflow and parallel processing capabilities, offer significant advantages when performing computations for DNNs. Existing studies frequently overlook various hardware constraints in systolic array accelerators when representing mapping strategies. This oversight includes ignoring the differences in delays between communication and computation operations, as well as overlooking the capacities of multilevel memory hierarchies. Such omissions can lead to inaccuracies in predicting accelerator performance and inefficiencies in system design. We propose the FAMS framework, which introduces a memory-centric notation capable of fully representing the mapping of DNN operations on systolic array accelerators. Memory-centric notation moves away from the idealized assumptions of previous notations and considers various hardware constraints, thereby expanding the effective design and mapping spaces. The FAMS framework also includes a cycle-accurate simulator, which takes the hardware configurations, task descriptions, and mapping strategy represented by memory-centric notation as inputs, providing various metrics such as latency and energy consumption. The experimental results demonstrate that our proposed FAMS framework reduces latency by up to 29.7% and increases throughput by 42.4% compared to the state-of-the-art TENET framework. Additionally, under hardware configurations with a MAC delay of 2 and 3 clock cycles, the FAMS framework enhances performance by 12.0% and 25.4%, respectively.</abstract><pub>IEEE</pub><doi>10.1109/TVLSI.2024.3522326</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0001-5786-3171</orcidid><orcidid>https://orcid.org/0000-0002-2938-2714</orcidid><orcidid>https://orcid.org/0000-0002-2918-8647</orcidid><orcidid>https://orcid.org/0000-0001-6233-6800</orcidid><orcidid>https://orcid.org/0000-0003-4912-0364</orcidid></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 1063-8210
ispartof	IEEE transactions on very large scale integration (VLSI) systems, 2025-01, p.1-14
issn	1063-8210 1557-9999
language	eng
recordid	cdi_ieee_primary_10843963
source	IEEE Electronic Library (IEL)
subjects	Arrays Artificial neural networks Computer architecture Dataflow deep neural network (DNN) accelerator Delays design space exploration Energy consumption Hardware mapping space exploration Space exploration System-on-chip systolic array Systolic arrays Tensors
title	FAMS: A FrAmework of Memory-Centric Mapping for DNNs on Systolic Array Accelerators
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-13T13%3A20%3A02IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=FAMS:%20A%20FrAmework%20of%20Memory-Centric%20Mapping%20for%20DNNs%20on%20Systolic%20Array%20Accelerators&rft.jtitle=IEEE%20transactions%20on%20very%20large%20scale%20integration%20(VLSI)%20systems&rft.au=Sun,%20Hao&rft.date=2025-01-16&rft.spage=1&rft.epage=14&rft.pages=1-14&rft.issn=1063-8210&rft.eissn=1557-9999&rft.coden=IEVSE9&rft_id=info:doi/10.1109/TVLSI.2024.3522326&rft_dat=%3Ccrossref_RIE%3E10_1109_TVLSI_2024_3522326%3C/crossref_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=10843963&rfr_iscdi=true