ASC: Adaptive Scale Feature Map Compression for Deep Neural Network

Deep-learning accelerators are increasingly in demand; however, their performance is constrained by the size of the feature map, leading to high bandwidth requirements and large buffer sizes. We propose an adaptive scale feature map compression technique leveraging the unique properties of the featu...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on circuits and systems. I, Regular papers Regular papers, 2024-03, Vol.71 (3), p.1417-1428
Hauptverfasser:	Yao, Yuan, Chang, Tian-Sheuan
Format:	Artikel
Sprache:	eng
Schlagworte:	Artificial neural networks Bandwidths Compression Compressive strength Correlation deep learning Feature maps Gate counting Hardware hardware acceleration Image coding Image color analysis Indexing Interpolation Machine learning Outliers (statistics) Shape
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	1428
container_issue	3
container_start_page	1417
container_title	IEEE transactions on circuits and systems. I, Regular papers
container_volume	71
creator	Yao, Yuan Chang, Tian-Sheuan
description	Deep-learning accelerators are increasingly in demand; however, their performance is constrained by the size of the feature map, leading to high bandwidth requirements and large buffer sizes. We propose an adaptive scale feature map compression technique leveraging the unique properties of the feature map. This technique adopts independent channel indexing given the weak channel correlation and utilizes a cubical-like block shape to benefit from strong local correlations. The method further optimizes compression using a switchable endpoint mode and adaptive scale interpolation to handle unimodal data distributions, both with and without outliers. This results in 4\times and up to 7.69\times compression rates for 16-bit data in constant and variable bitrates, respectively. Our hardware design minimizes area cost by adjusting interpolation scales, which facilitates hardware sharing among interpolation points. Additionally, we introduce a threshold concept for straightforward interpolation, preventing the need for intricate hardware. The TSMC 28nm implementation showcases an equivalent gate count of 6135 for the 8-bit version. Furthermore, the hardware architecture scales effectively, with only a sublinear increase in area cost. Achieving a 32\times throughput increase meets the theoretical bandwidth of DDR5-6400 at just 7.65\times the hardware cost.
doi_str_mv	10.1109/TCSI.2023.3337283
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_2932575339</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10359112</ieee_id><sourcerecordid>2932575339</sourcerecordid><originalsourceid>FETCH-LOGICAL-c246t-dade9369d4735f5eae314a91e7081f73a04171b98efa350122bcbf7cad806d633</originalsourceid><addsrcrecordid>eNpNkLFOwzAQhi0EEqXwAEgMlphTfL44idmq0EKlAkPLbLnJWUppm2AnIN6eRO3A9N_w_Xe6j7FbEBMAoR_W-WoxkULiBBFTmeEZG4FSWSQykZwPc6yjDGV2ya5C2AohtUAYsXy6yh_5tLRNW30TXxV2R3xOtu088Vfb8LzeN55CqOoDd7XnT0QNf6PO210f7U_tP6_ZhbO7QDenHLOP-Wydv0TL9-dFPl1GhYyTNiptSRoTXcYpKqfIEkJsNVAqMnApWhFDChudkbOoBEi5KTYuLWzZv1AmiGN2f9zb-Pqro9Cabd35Q3_SSI1SpQpR9xQcqcLXIXhypvHV3vpfA8IMrszgygyuzMlV37k7dioi-sej0gAS_wASkWO3</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2932575339</pqid></control><display><type>article</type><title>ASC: Adaptive Scale Feature Map Compression for Deep Neural Network</title><source>IEEE Electronic Library (IEL)</source><creator>Yao, Yuan ; Chang, Tian-Sheuan</creator><creatorcontrib>Yao, Yuan ; Chang, Tian-Sheuan</creatorcontrib><description><![CDATA[Deep-learning accelerators are increasingly in demand; however, their performance is constrained by the size of the feature map, leading to high bandwidth requirements and large buffer sizes. We propose an adaptive scale feature map compression technique leveraging the unique properties of the feature map. This technique adopts independent channel indexing given the weak channel correlation and utilizes a cubical-like block shape to benefit from strong local correlations. The method further optimizes compression using a switchable endpoint mode and adaptive scale interpolation to handle unimodal data distributions, both with and without outliers. This results in <inline-formula> <tex-math notation="LaTeX">4\times </tex-math></inline-formula> and up to <inline-formula> <tex-math notation="LaTeX">7.69\times </tex-math></inline-formula> compression rates for 16-bit data in constant and variable bitrates, respectively. Our hardware design minimizes area cost by adjusting interpolation scales, which facilitates hardware sharing among interpolation points. Additionally, we introduce a threshold concept for straightforward interpolation, preventing the need for intricate hardware. The TSMC 28nm implementation showcases an equivalent gate count of 6135 for the 8-bit version. Furthermore, the hardware architecture scales effectively, with only a sublinear increase in area cost. Achieving a <inline-formula> <tex-math notation="LaTeX">32\times </tex-math></inline-formula> throughput increase meets the theoretical bandwidth of DDR5-6400 at just <inline-formula> <tex-math notation="LaTeX">7.65\times </tex-math></inline-formula> the hardware cost.]]></description><identifier>ISSN: 1549-8328</identifier><identifier>EISSN: 1558-0806</identifier><identifier>DOI: 10.1109/TCSI.2023.3337283</identifier><identifier>CODEN: ITCSCH</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Artificial neural networks ; Bandwidths ; Compression ; Compressive strength ; Correlation ; deep learning ; Feature maps ; Gate counting ; Hardware ; hardware acceleration ; Image coding ; Image color analysis ; Indexing ; Interpolation ; Machine learning ; Outliers (statistics) ; Shape</subject><ispartof>IEEE transactions on circuits and systems. I, Regular papers, 2024-03, Vol.71 (3), p.1417-1428</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c246t-dade9369d4735f5eae314a91e7081f73a04171b98efa350122bcbf7cad806d633</cites><orcidid>0009-0006-6884-8827 ; 0000-0002-0561-8745</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10359112$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10359112$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Yao, Yuan</creatorcontrib><creatorcontrib>Chang, Tian-Sheuan</creatorcontrib><title>ASC: Adaptive Scale Feature Map Compression for Deep Neural Network</title><title>IEEE transactions on circuits and systems. I, Regular papers</title><addtitle>TCSI</addtitle><description><![CDATA[Deep-learning accelerators are increasingly in demand; however, their performance is constrained by the size of the feature map, leading to high bandwidth requirements and large buffer sizes. We propose an adaptive scale feature map compression technique leveraging the unique properties of the feature map. This technique adopts independent channel indexing given the weak channel correlation and utilizes a cubical-like block shape to benefit from strong local correlations. The method further optimizes compression using a switchable endpoint mode and adaptive scale interpolation to handle unimodal data distributions, both with and without outliers. This results in <inline-formula> <tex-math notation="LaTeX">4\times </tex-math></inline-formula> and up to <inline-formula> <tex-math notation="LaTeX">7.69\times </tex-math></inline-formula> compression rates for 16-bit data in constant and variable bitrates, respectively. Our hardware design minimizes area cost by adjusting interpolation scales, which facilitates hardware sharing among interpolation points. Additionally, we introduce a threshold concept for straightforward interpolation, preventing the need for intricate hardware. The TSMC 28nm implementation showcases an equivalent gate count of 6135 for the 8-bit version. Furthermore, the hardware architecture scales effectively, with only a sublinear increase in area cost. Achieving a <inline-formula> <tex-math notation="LaTeX">32\times </tex-math></inline-formula> throughput increase meets the theoretical bandwidth of DDR5-6400 at just <inline-formula> <tex-math notation="LaTeX">7.65\times </tex-math></inline-formula> the hardware cost.]]></description><subject>Artificial neural networks</subject><subject>Bandwidths</subject><subject>Compression</subject><subject>Compressive strength</subject><subject>Correlation</subject><subject>deep learning</subject><subject>Feature maps</subject><subject>Gate counting</subject><subject>Hardware</subject><subject>hardware acceleration</subject><subject>Image coding</subject><subject>Image color analysis</subject><subject>Indexing</subject><subject>Interpolation</subject><subject>Machine learning</subject><subject>Outliers (statistics)</subject><subject>Shape</subject><issn>1549-8328</issn><issn>1558-0806</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNkLFOwzAQhi0EEqXwAEgMlphTfL44idmq0EKlAkPLbLnJWUppm2AnIN6eRO3A9N_w_Xe6j7FbEBMAoR_W-WoxkULiBBFTmeEZG4FSWSQykZwPc6yjDGV2ya5C2AohtUAYsXy6yh_5tLRNW30TXxV2R3xOtu088Vfb8LzeN55CqOoDd7XnT0QNf6PO210f7U_tP6_ZhbO7QDenHLOP-Wydv0TL9-dFPl1GhYyTNiptSRoTXcYpKqfIEkJsNVAqMnApWhFDChudkbOoBEi5KTYuLWzZv1AmiGN2f9zb-Pqro9Cabd35Q3_SSI1SpQpR9xQcqcLXIXhypvHV3vpfA8IMrszgygyuzMlV37k7dioi-sej0gAS_wASkWO3</recordid><startdate>20240301</startdate><enddate>20240301</enddate><creator>Yao, Yuan</creator><creator>Chang, Tian-Sheuan</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SP</scope><scope>8FD</scope><scope>L7M</scope><orcidid>https://orcid.org/0009-0006-6884-8827</orcidid><orcidid>https://orcid.org/0000-0002-0561-8745</orcidid></search><sort><creationdate>20240301</creationdate><title>ASC: Adaptive Scale Feature Map Compression for Deep Neural Network</title><author>Yao, Yuan ; Chang, Tian-Sheuan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c246t-dade9369d4735f5eae314a91e7081f73a04171b98efa350122bcbf7cad806d633</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Artificial neural networks</topic><topic>Bandwidths</topic><topic>Compression</topic><topic>Compressive strength</topic><topic>Correlation</topic><topic>deep learning</topic><topic>Feature maps</topic><topic>Gate counting</topic><topic>Hardware</topic><topic>hardware acceleration</topic><topic>Image coding</topic><topic>Image color analysis</topic><topic>Indexing</topic><topic>Interpolation</topic><topic>Machine learning</topic><topic>Outliers (statistics)</topic><topic>Shape</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Yao, Yuan</creatorcontrib><creatorcontrib>Chang, Tian-Sheuan</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>Advanced Technologies Database with Aerospace</collection><jtitle>IEEE transactions on circuits and systems. I, Regular papers</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Yao, Yuan</au><au>Chang, Tian-Sheuan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>ASC: Adaptive Scale Feature Map Compression for Deep Neural Network</atitle><jtitle>IEEE transactions on circuits and systems. I, Regular papers</jtitle><stitle>TCSI</stitle><date>2024-03-01</date><risdate>2024</risdate><volume>71</volume><issue>3</issue><spage>1417</spage><epage>1428</epage><pages>1417-1428</pages><issn>1549-8328</issn><eissn>1558-0806</eissn><coden>ITCSCH</coden><abstract><![CDATA[Deep-learning accelerators are increasingly in demand; however, their performance is constrained by the size of the feature map, leading to high bandwidth requirements and large buffer sizes. We propose an adaptive scale feature map compression technique leveraging the unique properties of the feature map. This technique adopts independent channel indexing given the weak channel correlation and utilizes a cubical-like block shape to benefit from strong local correlations. The method further optimizes compression using a switchable endpoint mode and adaptive scale interpolation to handle unimodal data distributions, both with and without outliers. This results in <inline-formula> <tex-math notation="LaTeX">4\times </tex-math></inline-formula> and up to <inline-formula> <tex-math notation="LaTeX">7.69\times </tex-math></inline-formula> compression rates for 16-bit data in constant and variable bitrates, respectively. Our hardware design minimizes area cost by adjusting interpolation scales, which facilitates hardware sharing among interpolation points. Additionally, we introduce a threshold concept for straightforward interpolation, preventing the need for intricate hardware. The TSMC 28nm implementation showcases an equivalent gate count of 6135 for the 8-bit version. Furthermore, the hardware architecture scales effectively, with only a sublinear increase in area cost. Achieving a <inline-formula> <tex-math notation="LaTeX">32\times </tex-math></inline-formula> throughput increase meets the theoretical bandwidth of DDR5-6400 at just <inline-formula> <tex-math notation="LaTeX">7.65\times </tex-math></inline-formula> the hardware cost.]]></abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TCSI.2023.3337283</doi><tpages>12</tpages><orcidid>https://orcid.org/0009-0006-6884-8827</orcidid><orcidid>https://orcid.org/0000-0002-0561-8745</orcidid></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 1549-8328
ispartof	IEEE transactions on circuits and systems. I, Regular papers, 2024-03, Vol.71 (3), p.1417-1428
issn	1549-8328 1558-0806
language	eng
recordid	cdi_proquest_journals_2932575339
source	IEEE Electronic Library (IEL)
subjects	Artificial neural networks Bandwidths Compression Compressive strength Correlation deep learning Feature maps Gate counting Hardware hardware acceleration Image coding Image color analysis Indexing Interpolation Machine learning Outliers (statistics) Shape
title	ASC: Adaptive Scale Feature Map Compression for Deep Neural Network
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-19T16%3A08%3A43IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=ASC:%20Adaptive%20Scale%20Feature%20Map%20Compression%20for%20Deep%20Neural%20Network&rft.jtitle=IEEE%20transactions%20on%20circuits%20and%20systems.%20I,%20Regular%20papers&rft.au=Yao,%20Yuan&rft.date=2024-03-01&rft.volume=71&rft.issue=3&rft.spage=1417&rft.epage=1428&rft.pages=1417-1428&rft.issn=1549-8328&rft.eissn=1558-0806&rft.coden=ITCSCH&rft_id=info:doi/10.1109/TCSI.2023.3337283&rft_dat=%3Cproquest_RIE%3E2932575339%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2932575339&rft_id=info:pmid/&rft_ieee_id=10359112&rfr_iscdi=true