Mobileware: Distributed Architecture With Channel Stationary Dataflow for MobileNet Acceleration

The depthwise separable convolution, a key feature of the MobileNet models, has a different input reuse pattern from the conventional standard convolution, and a smaller number of input/weight pairs are used for a dot product, thereby leading to extremely low MAC utilization. This article proposes a...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on computer-aided design of integrated circuits and systems 2024-09, Vol.43 (9), p.2661-2673
Hauptverfasser:	Ryu, Sungju, Jang, Jaeyong, Oh, Youngtaek, Kim, Jae-Joon
Format:	Artikel
Sprache:	eng
Schlagworte:	Computational modeling Computer architecture Convolution Dataflow deep neural network depthwise convolution Design automation distributed memory hardware accelerator MobileNet processing element (PE) Random access memory Static random access memory System-on-chip
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	2673
container_issue	9
container_start_page	2661
container_title	IEEE transactions on computer-aided design of integrated circuits and systems
container_volume	43
creator	Ryu, Sungju Jang, Jaeyong Oh, Youngtaek Kim, Jae-Joon
description	The depthwise separable convolution, a key feature of the MobileNet models, has a different input reuse pattern from the conventional standard convolution, and a smaller number of input/weight pairs are used for a dot product, thereby leading to extremely low MAC utilization. This article proposes a Mobileware architecture for the high-performance acceleration of the MobileNet workloads. A new channel stationary dataflow architecture distributes the on-chip buffers, and the distributed SRAMs are placed near each processing element (PE). By doing so, PEs and SRAMs can communicate with high bandwidth. Our Mobileware architecture shows 1.4\times - 29.5\times higher throughput than conventional weight stationary-based hardware architecture, and the proposed design was verified on the Xilinx ZCU102 FPGA evaluation board.
doi_str_mv	10.1109/TCAD.2024.3380555
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_ieee_primary_10477545</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10477545</ieee_id><sourcerecordid>3096079467</sourcerecordid><originalsourceid>FETCH-LOGICAL-c246t-e518236b192aa0e1276f35f89f49c86b6d8f0b00673b1eafbdc19e84a933ce9c3</originalsourceid><addsrcrecordid>eNpNkD1PwzAURS0EEqXwA5AYLDGnPMd2HLNFKV9SgYEixuC4z2qqkBTHUcW_JyUdmN5y7r16h5BLBjPGQN8s82w-iyEWM85TkFIekQnTXEWCSXZMJhCrNAJQcErOum4DwISM9YR8PrdlVePOeLyl86oLvir7gCuaebuuAtrQe6QfVVjTfG2aBmv6Fkyo2sb4Hzo3wbi63VHXejo2vWCgmbVYo__DzsmJM3WHF4c7Je_3d8v8MVq8Pjzl2SKysUhChJKlMU9KpmNjAFmsEselS7UT2qZJmaxSByVAonjJ0LhyZZnGVBjNuUVt-ZRcj71b33732IVi0_a-GSYLDjoBpcWQnRI2Uta3XefRFVtffQ2vFAyKvchiL7LYiywOIofM1ZipEPEfL5SSQvJf0ylv7Q</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3096079467</pqid></control><display><type>article</type><title>Mobileware: Distributed Architecture With Channel Stationary Dataflow for MobileNet Acceleration</title><source>IEEE Electronic Library (IEL)</source><creator>Ryu, Sungju ; Jang, Jaeyong ; Oh, Youngtaek ; Kim, Jae-Joon</creator><creatorcontrib>Ryu, Sungju ; Jang, Jaeyong ; Oh, Youngtaek ; Kim, Jae-Joon</creatorcontrib><description><![CDATA[The depthwise separable convolution, a key feature of the MobileNet models, has a different input reuse pattern from the conventional standard convolution, and a smaller number of input/weight pairs are used for a dot product, thereby leading to extremely low MAC utilization. This article proposes a Mobileware architecture for the high-performance acceleration of the MobileNet workloads. A new channel stationary dataflow architecture distributes the on-chip buffers, and the distributed SRAMs are placed near each processing element (PE). By doing so, PEs and SRAMs can communicate with high bandwidth. Our Mobileware architecture shows <inline-formula> <tex-math notation="LaTeX">1.4\times </tex-math></inline-formula>-<inline-formula> <tex-math notation="LaTeX">29.5\times </tex-math></inline-formula> higher throughput than conventional weight stationary-based hardware architecture, and the proposed design was verified on the Xilinx ZCU102 FPGA evaluation board.]]></description><identifier>ISSN: 0278-0070</identifier><identifier>EISSN: 1937-4151</identifier><identifier>DOI: 10.1109/TCAD.2024.3380555</identifier><identifier>CODEN: ITCSDI</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Computational modeling ; Computer architecture ; Convolution ; Dataflow ; deep neural network ; depthwise convolution ; Design automation ; distributed memory ; hardware accelerator ; MobileNet ; processing element (PE) ; Random access memory ; Static random access memory ; System-on-chip</subject><ispartof>IEEE transactions on computer-aided design of integrated circuits and systems, 2024-09, Vol.43 (9), p.2661-2673</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c246t-e518236b192aa0e1276f35f89f49c86b6d8f0b00673b1eafbdc19e84a933ce9c3</cites><orcidid>0000-0001-5175-8258 ; 0000-0002-0254-391X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10477545$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10477545$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Ryu, Sungju</creatorcontrib><creatorcontrib>Jang, Jaeyong</creatorcontrib><creatorcontrib>Oh, Youngtaek</creatorcontrib><creatorcontrib>Kim, Jae-Joon</creatorcontrib><title>Mobileware: Distributed Architecture With Channel Stationary Dataflow for MobileNet Acceleration</title><title>IEEE transactions on computer-aided design of integrated circuits and systems</title><addtitle>TCAD</addtitle><description><![CDATA[The depthwise separable convolution, a key feature of the MobileNet models, has a different input reuse pattern from the conventional standard convolution, and a smaller number of input/weight pairs are used for a dot product, thereby leading to extremely low MAC utilization. This article proposes a Mobileware architecture for the high-performance acceleration of the MobileNet workloads. A new channel stationary dataflow architecture distributes the on-chip buffers, and the distributed SRAMs are placed near each processing element (PE). By doing so, PEs and SRAMs can communicate with high bandwidth. Our Mobileware architecture shows <inline-formula> <tex-math notation="LaTeX">1.4\times </tex-math></inline-formula>-<inline-formula> <tex-math notation="LaTeX">29.5\times </tex-math></inline-formula> higher throughput than conventional weight stationary-based hardware architecture, and the proposed design was verified on the Xilinx ZCU102 FPGA evaluation board.]]></description><subject>Computational modeling</subject><subject>Computer architecture</subject><subject>Convolution</subject><subject>Dataflow</subject><subject>deep neural network</subject><subject>depthwise convolution</subject><subject>Design automation</subject><subject>distributed memory</subject><subject>hardware accelerator</subject><subject>MobileNet</subject><subject>processing element (PE)</subject><subject>Random access memory</subject><subject>Static random access memory</subject><subject>System-on-chip</subject><issn>0278-0070</issn><issn>1937-4151</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNkD1PwzAURS0EEqXwA5AYLDGnPMd2HLNFKV9SgYEixuC4z2qqkBTHUcW_JyUdmN5y7r16h5BLBjPGQN8s82w-iyEWM85TkFIekQnTXEWCSXZMJhCrNAJQcErOum4DwISM9YR8PrdlVePOeLyl86oLvir7gCuaebuuAtrQe6QfVVjTfG2aBmv6Fkyo2sb4Hzo3wbi63VHXejo2vWCgmbVYo__DzsmJM3WHF4c7Je_3d8v8MVq8Pjzl2SKysUhChJKlMU9KpmNjAFmsEselS7UT2qZJmaxSByVAonjJ0LhyZZnGVBjNuUVt-ZRcj71b33732IVi0_a-GSYLDjoBpcWQnRI2Uta3XefRFVtffQ2vFAyKvchiL7LYiywOIofM1ZipEPEfL5SSQvJf0ylv7Q</recordid><startdate>20240901</startdate><enddate>20240901</enddate><creator>Ryu, Sungju</creator><creator>Jang, Jaeyong</creator><creator>Oh, Youngtaek</creator><creator>Kim, Jae-Joon</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0001-5175-8258</orcidid><orcidid>https://orcid.org/0000-0002-0254-391X</orcidid></search><sort><creationdate>20240901</creationdate><title>Mobileware: Distributed Architecture With Channel Stationary Dataflow for MobileNet Acceleration</title><author>Ryu, Sungju ; Jang, Jaeyong ; Oh, Youngtaek ; Kim, Jae-Joon</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c246t-e518236b192aa0e1276f35f89f49c86b6d8f0b00673b1eafbdc19e84a933ce9c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computational modeling</topic><topic>Computer architecture</topic><topic>Convolution</topic><topic>Dataflow</topic><topic>deep neural network</topic><topic>depthwise convolution</topic><topic>Design automation</topic><topic>distributed memory</topic><topic>hardware accelerator</topic><topic>MobileNet</topic><topic>processing element (PE)</topic><topic>Random access memory</topic><topic>Static random access memory</topic><topic>System-on-chip</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Ryu, Sungju</creatorcontrib><creatorcontrib>Jang, Jaeyong</creatorcontrib><creatorcontrib>Oh, Youngtaek</creatorcontrib><creatorcontrib>Kim, Jae-Joon</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on computer-aided design of integrated circuits and systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Ryu, Sungju</au><au>Jang, Jaeyong</au><au>Oh, Youngtaek</au><au>Kim, Jae-Joon</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Mobileware: Distributed Architecture With Channel Stationary Dataflow for MobileNet Acceleration</atitle><jtitle>IEEE transactions on computer-aided design of integrated circuits and systems</jtitle><stitle>TCAD</stitle><date>2024-09-01</date><risdate>2024</risdate><volume>43</volume><issue>9</issue><spage>2661</spage><epage>2673</epage><pages>2661-2673</pages><issn>0278-0070</issn><eissn>1937-4151</eissn><coden>ITCSDI</coden><abstract><![CDATA[The depthwise separable convolution, a key feature of the MobileNet models, has a different input reuse pattern from the conventional standard convolution, and a smaller number of input/weight pairs are used for a dot product, thereby leading to extremely low MAC utilization. This article proposes a Mobileware architecture for the high-performance acceleration of the MobileNet workloads. A new channel stationary dataflow architecture distributes the on-chip buffers, and the distributed SRAMs are placed near each processing element (PE). By doing so, PEs and SRAMs can communicate with high bandwidth. Our Mobileware architecture shows <inline-formula> <tex-math notation="LaTeX">1.4\times </tex-math></inline-formula>-<inline-formula> <tex-math notation="LaTeX">29.5\times </tex-math></inline-formula> higher throughput than conventional weight stationary-based hardware architecture, and the proposed design was verified on the Xilinx ZCU102 FPGA evaluation board.]]></abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TCAD.2024.3380555</doi><tpages>13</tpages><orcidid>https://orcid.org/0000-0001-5175-8258</orcidid><orcidid>https://orcid.org/0000-0002-0254-391X</orcidid></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 0278-0070
ispartof	IEEE transactions on computer-aided design of integrated circuits and systems, 2024-09, Vol.43 (9), p.2661-2673
issn	0278-0070 1937-4151
language	eng
recordid	cdi_ieee_primary_10477545
source	IEEE Electronic Library (IEL)
subjects	Computational modeling Computer architecture Convolution Dataflow deep neural network depthwise convolution Design automation distributed memory hardware accelerator MobileNet processing element (PE) Random access memory Static random access memory System-on-chip
title	Mobileware: Distributed Architecture With Channel Stationary Dataflow for MobileNet Acceleration
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-05T13%3A39%3A08IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Mobileware:%20Distributed%20Architecture%20With%20Channel%20Stationary%20Dataflow%20for%20MobileNet%20Acceleration&rft.jtitle=IEEE%20transactions%20on%20computer-aided%20design%20of%20integrated%20circuits%20and%20systems&rft.au=Ryu,%20Sungju&rft.date=2024-09-01&rft.volume=43&rft.issue=9&rft.spage=2661&rft.epage=2673&rft.pages=2661-2673&rft.issn=0278-0070&rft.eissn=1937-4151&rft.coden=ITCSDI&rft_id=info:doi/10.1109/TCAD.2024.3380555&rft_dat=%3Cproquest_RIE%3E3096079467%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3096079467&rft_id=info:pmid/&rft_ieee_id=10477545&rfr_iscdi=true