Mobileware: Distributed Architecture With Channel Stationary Dataflow for MobileNet Acceleration

The depthwise separable convolution, a key feature of the MobileNet models, has a different input reuse pattern from the conventional standard convolution, and a smaller number of input/weight pairs are used for a dot product, thereby leading to extremely low MAC utilization. This article proposes a...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on computer-aided design of integrated circuits and systems 2024-09, Vol.43 (9), p.2661-2673
Hauptverfasser: Ryu, Sungju, Jang, Jaeyong, Oh, Youngtaek, Kim, Jae-Joon
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 2673
container_issue 9
container_start_page 2661
container_title IEEE transactions on computer-aided design of integrated circuits and systems
container_volume 43
creator Ryu, Sungju
Jang, Jaeyong
Oh, Youngtaek
Kim, Jae-Joon
description The depthwise separable convolution, a key feature of the MobileNet models, has a different input reuse pattern from the conventional standard convolution, and a smaller number of input/weight pairs are used for a dot product, thereby leading to extremely low MAC utilization. This article proposes a Mobileware architecture for the high-performance acceleration of the MobileNet workloads. A new channel stationary dataflow architecture distributes the on-chip buffers, and the distributed SRAMs are placed near each processing element (PE). By doing so, PEs and SRAMs can communicate with high bandwidth. Our Mobileware architecture shows 1.4\times - 29.5\times higher throughput than conventional weight stationary-based hardware architecture, and the proposed design was verified on the Xilinx ZCU102 FPGA evaluation board.
doi_str_mv 10.1109/TCAD.2024.3380555
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_ieee_primary_10477545</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10477545</ieee_id><sourcerecordid>3096079467</sourcerecordid><originalsourceid>FETCH-LOGICAL-c246t-e518236b192aa0e1276f35f89f49c86b6d8f0b00673b1eafbdc19e84a933ce9c3</originalsourceid><addsrcrecordid>eNpNkD1PwzAURS0EEqXwA5AYLDGnPMd2HLNFKV9SgYEixuC4z2qqkBTHUcW_JyUdmN5y7r16h5BLBjPGQN8s82w-iyEWM85TkFIekQnTXEWCSXZMJhCrNAJQcErOum4DwISM9YR8PrdlVePOeLyl86oLvir7gCuaebuuAtrQe6QfVVjTfG2aBmv6Fkyo2sb4Hzo3wbi63VHXejo2vWCgmbVYo__DzsmJM3WHF4c7Je_3d8v8MVq8Pjzl2SKysUhChJKlMU9KpmNjAFmsEselS7UT2qZJmaxSByVAonjJ0LhyZZnGVBjNuUVt-ZRcj71b33732IVi0_a-GSYLDjoBpcWQnRI2Uta3XefRFVtffQ2vFAyKvchiL7LYiywOIofM1ZipEPEfL5SSQvJf0ylv7Q</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3096079467</pqid></control><display><type>article</type><title>Mobileware: Distributed Architecture With Channel Stationary Dataflow for MobileNet Acceleration</title><source>IEEE Electronic Library (IEL)</source><creator>Ryu, Sungju ; Jang, Jaeyong ; Oh, Youngtaek ; Kim, Jae-Joon</creator><creatorcontrib>Ryu, Sungju ; Jang, Jaeyong ; Oh, Youngtaek ; Kim, Jae-Joon</creatorcontrib><description><![CDATA[The depthwise separable convolution, a key feature of the MobileNet models, has a different input reuse pattern from the conventional standard convolution, and a smaller number of input/weight pairs are used for a dot product, thereby leading to extremely low MAC utilization. This article proposes a Mobileware architecture for the high-performance acceleration of the MobileNet workloads. A new channel stationary dataflow architecture distributes the on-chip buffers, and the distributed SRAMs are placed near each processing element (PE). By doing so, PEs and SRAMs can communicate with high bandwidth. Our Mobileware architecture shows <inline-formula> <tex-math notation="LaTeX">1.4\times </tex-math></inline-formula>-<inline-formula> <tex-math notation="LaTeX">29.5\times </tex-math></inline-formula> higher throughput than conventional weight stationary-based hardware architecture, and the proposed design was verified on the Xilinx ZCU102 FPGA evaluation board.]]></description><identifier>ISSN: 0278-0070</identifier><identifier>EISSN: 1937-4151</identifier><identifier>DOI: 10.1109/TCAD.2024.3380555</identifier><identifier>CODEN: ITCSDI</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Computational modeling ; Computer architecture ; Convolution ; Dataflow ; deep neural network ; depthwise convolution ; Design automation ; distributed memory ; hardware accelerator ; MobileNet ; processing element (PE) ; Random access memory ; Static random access memory ; System-on-chip</subject><ispartof>IEEE transactions on computer-aided design of integrated circuits and systems, 2024-09, Vol.43 (9), p.2661-2673</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c246t-e518236b192aa0e1276f35f89f49c86b6d8f0b00673b1eafbdc19e84a933ce9c3</cites><orcidid>0000-0001-5175-8258 ; 0000-0002-0254-391X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10477545$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10477545$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Ryu, Sungju</creatorcontrib><creatorcontrib>Jang, Jaeyong</creatorcontrib><creatorcontrib>Oh, Youngtaek</creatorcontrib><creatorcontrib>Kim, Jae-Joon</creatorcontrib><title>Mobileware: Distributed Architecture With Channel Stationary Dataflow for MobileNet Acceleration</title><title>IEEE transactions on computer-aided design of integrated circuits and systems</title><addtitle>TCAD</addtitle><description><![CDATA[The depthwise separable convolution, a key feature of the MobileNet models, has a different input reuse pattern from the conventional standard convolution, and a smaller number of input/weight pairs are used for a dot product, thereby leading to extremely low MAC utilization. This article proposes a Mobileware architecture for the high-performance acceleration of the MobileNet workloads. A new channel stationary dataflow architecture distributes the on-chip buffers, and the distributed SRAMs are placed near each processing element (PE). By doing so, PEs and SRAMs can communicate with high bandwidth. Our Mobileware architecture shows <inline-formula> <tex-math notation="LaTeX">1.4\times </tex-math></inline-formula>-<inline-formula> <tex-math notation="LaTeX">29.5\times </tex-math></inline-formula> higher throughput than conventional weight stationary-based hardware architecture, and the proposed design was verified on the Xilinx ZCU102 FPGA evaluation board.]]></description><subject>Computational modeling</subject><subject>Computer architecture</subject><subject>Convolution</subject><subject>Dataflow</subject><subject>deep neural network</subject><subject>depthwise convolution</subject><subject>Design automation</subject><subject>distributed memory</subject><subject>hardware accelerator</subject><subject>MobileNet</subject><subject>processing element (PE)</subject><subject>Random access memory</subject><subject>Static random access memory</subject><subject>System-on-chip</subject><issn>0278-0070</issn><issn>1937-4151</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNkD1PwzAURS0EEqXwA5AYLDGnPMd2HLNFKV9SgYEixuC4z2qqkBTHUcW_JyUdmN5y7r16h5BLBjPGQN8s82w-iyEWM85TkFIekQnTXEWCSXZMJhCrNAJQcErOum4DwISM9YR8PrdlVePOeLyl86oLvir7gCuaebuuAtrQe6QfVVjTfG2aBmv6Fkyo2sb4Hzo3wbi63VHXejo2vWCgmbVYo__DzsmJM3WHF4c7Je_3d8v8MVq8Pjzl2SKysUhChJKlMU9KpmNjAFmsEselS7UT2qZJmaxSByVAonjJ0LhyZZnGVBjNuUVt-ZRcj71b33732IVi0_a-GSYLDjoBpcWQnRI2Uta3XefRFVtffQ2vFAyKvchiL7LYiywOIofM1ZipEPEfL5SSQvJf0ylv7Q</recordid><startdate>20240901</startdate><enddate>20240901</enddate><creator>Ryu, Sungju</creator><creator>Jang, Jaeyong</creator><creator>Oh, Youngtaek</creator><creator>Kim, Jae-Joon</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0001-5175-8258</orcidid><orcidid>https://orcid.org/0000-0002-0254-391X</orcidid></search><sort><creationdate>20240901</creationdate><title>Mobileware: Distributed Architecture With Channel Stationary Dataflow for MobileNet Acceleration</title><author>Ryu, Sungju ; Jang, Jaeyong ; Oh, Youngtaek ; Kim, Jae-Joon</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c246t-e518236b192aa0e1276f35f89f49c86b6d8f0b00673b1eafbdc19e84a933ce9c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computational modeling</topic><topic>Computer architecture</topic><topic>Convolution</topic><topic>Dataflow</topic><topic>deep neural network</topic><topic>depthwise convolution</topic><topic>Design automation</topic><topic>distributed memory</topic><topic>hardware accelerator</topic><topic>MobileNet</topic><topic>processing element (PE)</topic><topic>Random access memory</topic><topic>Static random access memory</topic><topic>System-on-chip</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Ryu, Sungju</creatorcontrib><creatorcontrib>Jang, Jaeyong</creatorcontrib><creatorcontrib>Oh, Youngtaek</creatorcontrib><creatorcontrib>Kim, Jae-Joon</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on computer-aided design of integrated circuits and systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Ryu, Sungju</au><au>Jang, Jaeyong</au><au>Oh, Youngtaek</au><au>Kim, Jae-Joon</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Mobileware: Distributed Architecture With Channel Stationary Dataflow for MobileNet Acceleration</atitle><jtitle>IEEE transactions on computer-aided design of integrated circuits and systems</jtitle><stitle>TCAD</stitle><date>2024-09-01</date><risdate>2024</risdate><volume>43</volume><issue>9</issue><spage>2661</spage><epage>2673</epage><pages>2661-2673</pages><issn>0278-0070</issn><eissn>1937-4151</eissn><coden>ITCSDI</coden><abstract><![CDATA[The depthwise separable convolution, a key feature of the MobileNet models, has a different input reuse pattern from the conventional standard convolution, and a smaller number of input/weight pairs are used for a dot product, thereby leading to extremely low MAC utilization. This article proposes a Mobileware architecture for the high-performance acceleration of the MobileNet workloads. A new channel stationary dataflow architecture distributes the on-chip buffers, and the distributed SRAMs are placed near each processing element (PE). By doing so, PEs and SRAMs can communicate with high bandwidth. Our Mobileware architecture shows <inline-formula> <tex-math notation="LaTeX">1.4\times </tex-math></inline-formula>-<inline-formula> <tex-math notation="LaTeX">29.5\times </tex-math></inline-formula> higher throughput than conventional weight stationary-based hardware architecture, and the proposed design was verified on the Xilinx ZCU102 FPGA evaluation board.]]></abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TCAD.2024.3380555</doi><tpages>13</tpages><orcidid>https://orcid.org/0000-0001-5175-8258</orcidid><orcidid>https://orcid.org/0000-0002-0254-391X</orcidid></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 0278-0070
ispartof IEEE transactions on computer-aided design of integrated circuits and systems, 2024-09, Vol.43 (9), p.2661-2673
issn 0278-0070
1937-4151
language eng
recordid cdi_ieee_primary_10477545
source IEEE Electronic Library (IEL)
subjects Computational modeling
Computer architecture
Convolution
Dataflow
deep neural network
depthwise convolution
Design automation
distributed memory
hardware accelerator
MobileNet
processing element (PE)
Random access memory
Static random access memory
System-on-chip
title Mobileware: Distributed Architecture With Channel Stationary Dataflow for MobileNet Acceleration
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-05T13%3A39%3A08IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Mobileware:%20Distributed%20Architecture%20With%20Channel%20Stationary%20Dataflow%20for%20MobileNet%20Acceleration&rft.jtitle=IEEE%20transactions%20on%20computer-aided%20design%20of%20integrated%20circuits%20and%20systems&rft.au=Ryu,%20Sungju&rft.date=2024-09-01&rft.volume=43&rft.issue=9&rft.spage=2661&rft.epage=2673&rft.pages=2661-2673&rft.issn=0278-0070&rft.eissn=1937-4151&rft.coden=ITCSDI&rft_id=info:doi/10.1109/TCAD.2024.3380555&rft_dat=%3Cproquest_RIE%3E3096079467%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3096079467&rft_id=info:pmid/&rft_ieee_id=10477545&rfr_iscdi=true