Lightweight Compression of Intermediate Neural Network Features for Collaborative Intelligence

In collaborative intelligence applications, part of a deep neural network (DNN) is deployed on a lightweight device such as a mobile phone or edge device, and the remaining portion of the DNN is processed where more computing resources are available, such as in the cloud. This paper presents a novel...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE open journal of circuits and systems 2021, Vol.2, p.350-362
Hauptverfasser: Cohen, Robert A., Choi, Hyomin, Bajic, Ivan V.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 362
container_issue
container_start_page 350
container_title IEEE open journal of circuits and systems
container_volume 2
creator Cohen, Robert A.
Choi, Hyomin
Bajic, Ivan V.
description In collaborative intelligence applications, part of a deep neural network (DNN) is deployed on a lightweight device such as a mobile phone or edge device, and the remaining portion of the DNN is processed where more computing resources are available, such as in the cloud. This paper presents a novel lightweight compression technique designed specifically to quantize and compress the features output by the intermediate layer of a split DNN, without requiring any retraining of the network weights. Mathematical models for estimating the clipping and quantization error of leaky-ReLU and ReLU activations at this intermediate layer are used to compute optimal clipping ranges for coarse quantization. A mathematical model for estimating the clipping and quantization error of leaky-ReLU activations at this intermediate layer is developed and used to compute optimal clipping ranges for coarse quantization. We also present a modified entropy-constrained design algorithm for quantizing clipped activations. When applied to popular object-detection and classification DNNs, we were able to compress the 32-bit floating point intermediate activations down to 0.6 to 0.8 bits, while keeping the loss in accuracy to less than 1%. When compared to HEVC, we found that the lightweight codec consistently provided better inference accuracy, by up to 1.3%. The performance and simplicity of this lightweight compression technique makes it an attractive option for coding an intermediate layer of a split neural network for edge/cloud applications.
doi_str_mv 10.1109/OJCAS.2021.3072884
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1109_OJCAS_2021_3072884</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9430648</ieee_id><doaj_id>oai_doaj_org_article_0bba4d1f95a54dd89e882f8d4bd1d703</doaj_id><sourcerecordid>2531562130</sourcerecordid><originalsourceid>FETCH-LOGICAL-c405t-96427f64c8fb32db1d28bb3d252ff940b4d2a9f3e1559d71706f072634ac58773</originalsourceid><addsrcrecordid>eNpNkc1OAyEUhSdGExvtC-hmEtetcIEZZtk0VmsaXahbCQyXSp2WylCNb-_0J40bDiHnO_eSk2VXlAwpJdXt8-N49DIEAnTISAlS8pOsBwXnAwogTv_dz7N-2y4IISAopVD2sveZn3-kH9ye-Tgs1xHb1odVHlw-XSWMS7ReJ8yfcBN100n6CfEzn6BOm86buxA7rmm0CVEn_407rGn8HFc1XmZnTjct9g96kb1N7l7HD4PZ8_10PJoNak5EGlQFh9IVvJbOMLCGWpDGMAsCnKs4MdyCrhxDKkRlS1qSwnU_LRjXtZBlyS6y6T7XBr1Q6-iXOv6qoL3aPYQ4VzomXzeoiDGaW-oqoQW3VlYoJThpubHUloR1WTf7rHUMXxtsk1qETVx16ysQjIoCKCOdC_auOoa2jeiOUylR21rUrha1rUUdaumg6z3kEfEIVJyRgkv2B7WciQc</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2531562130</pqid></control><display><type>article</type><title>Lightweight Compression of Intermediate Neural Network Features for Collaborative Intelligence</title><source>IEEE Open Access Journals</source><source>DOAJ Directory of Open Access Journals</source><source>EZB-FREE-00999 freely available EZB journals</source><creator>Cohen, Robert A. ; Choi, Hyomin ; Bajic, Ivan V.</creator><creatorcontrib>Cohen, Robert A. ; Choi, Hyomin ; Bajic, Ivan V.</creatorcontrib><description>In collaborative intelligence applications, part of a deep neural network (DNN) is deployed on a lightweight device such as a mobile phone or edge device, and the remaining portion of the DNN is processed where more computing resources are available, such as in the cloud. This paper presents a novel lightweight compression technique designed specifically to quantize and compress the features output by the intermediate layer of a split DNN, without requiring any retraining of the network weights. Mathematical models for estimating the clipping and quantization error of leaky-ReLU and ReLU activations at this intermediate layer are used to compute optimal clipping ranges for coarse quantization. A mathematical model for estimating the clipping and quantization error of leaky-ReLU activations at this intermediate layer is developed and used to compute optimal clipping ranges for coarse quantization. We also present a modified entropy-constrained design algorithm for quantizing clipped activations. When applied to popular object-detection and classification DNNs, we were able to compress the 32-bit floating point intermediate activations down to 0.6 to 0.8 bits, while keeping the loss in accuracy to less than 1%. When compared to HEVC, we found that the lightweight codec consistently provided better inference accuracy, by up to 1.3%. The performance and simplicity of this lightweight compression technique makes it an attractive option for coding an intermediate layer of a split neural network for edge/cloud applications.</description><identifier>ISSN: 2644-1225</identifier><identifier>EISSN: 2644-1225</identifier><identifier>DOI: 10.1109/OJCAS.2021.3072884</identifier><identifier>CODEN: IOJCC3</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Accuracy ; Algorithms ; Artificial neural networks ; Cloud computing ; Codec ; Collaboration ; Collaborative intelligence ; deep learning ; Design modifications ; Entropy coding ; Estimation ; feature compression ; Floating point arithmetic ; Image coding ; Intelligence ; Lightweight ; Mathematical analysis ; Mathematical model ; Mathematical models ; Measurement ; Mobile handsets ; neural network compression ; Neural networks ; quantization ; Quantization (signal) ; Tensors</subject><ispartof>IEEE open journal of circuits and systems, 2021, Vol.2, p.350-362</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2021</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c405t-96427f64c8fb32db1d28bb3d252ff940b4d2a9f3e1559d71706f072634ac58773</citedby><cites>FETCH-LOGICAL-c405t-96427f64c8fb32db1d28bb3d252ff940b4d2a9f3e1559d71706f072634ac58773</cites><orcidid>0000-0003-3154-5743 ; 0000-0001-7724-8993</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9430648$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,780,784,864,2102,4024,27633,27923,27924,27925,54933</link.rule.ids></links><search><creatorcontrib>Cohen, Robert A.</creatorcontrib><creatorcontrib>Choi, Hyomin</creatorcontrib><creatorcontrib>Bajic, Ivan V.</creatorcontrib><title>Lightweight Compression of Intermediate Neural Network Features for Collaborative Intelligence</title><title>IEEE open journal of circuits and systems</title><addtitle>OJCAS</addtitle><description>In collaborative intelligence applications, part of a deep neural network (DNN) is deployed on a lightweight device such as a mobile phone or edge device, and the remaining portion of the DNN is processed where more computing resources are available, such as in the cloud. This paper presents a novel lightweight compression technique designed specifically to quantize and compress the features output by the intermediate layer of a split DNN, without requiring any retraining of the network weights. Mathematical models for estimating the clipping and quantization error of leaky-ReLU and ReLU activations at this intermediate layer are used to compute optimal clipping ranges for coarse quantization. A mathematical model for estimating the clipping and quantization error of leaky-ReLU activations at this intermediate layer is developed and used to compute optimal clipping ranges for coarse quantization. We also present a modified entropy-constrained design algorithm for quantizing clipped activations. When applied to popular object-detection and classification DNNs, we were able to compress the 32-bit floating point intermediate activations down to 0.6 to 0.8 bits, while keeping the loss in accuracy to less than 1%. When compared to HEVC, we found that the lightweight codec consistently provided better inference accuracy, by up to 1.3%. The performance and simplicity of this lightweight compression technique makes it an attractive option for coding an intermediate layer of a split neural network for edge/cloud applications.</description><subject>Accuracy</subject><subject>Algorithms</subject><subject>Artificial neural networks</subject><subject>Cloud computing</subject><subject>Codec</subject><subject>Collaboration</subject><subject>Collaborative intelligence</subject><subject>deep learning</subject><subject>Design modifications</subject><subject>Entropy coding</subject><subject>Estimation</subject><subject>feature compression</subject><subject>Floating point arithmetic</subject><subject>Image coding</subject><subject>Intelligence</subject><subject>Lightweight</subject><subject>Mathematical analysis</subject><subject>Mathematical model</subject><subject>Mathematical models</subject><subject>Measurement</subject><subject>Mobile handsets</subject><subject>neural network compression</subject><subject>Neural networks</subject><subject>quantization</subject><subject>Quantization (signal)</subject><subject>Tensors</subject><issn>2644-1225</issn><issn>2644-1225</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><sourceid>DOA</sourceid><recordid>eNpNkc1OAyEUhSdGExvtC-hmEtetcIEZZtk0VmsaXahbCQyXSp2WylCNb-_0J40bDiHnO_eSk2VXlAwpJdXt8-N49DIEAnTISAlS8pOsBwXnAwogTv_dz7N-2y4IISAopVD2sveZn3-kH9ye-Tgs1xHb1odVHlw-XSWMS7ReJ8yfcBN100n6CfEzn6BOm86buxA7rmm0CVEn_407rGn8HFc1XmZnTjct9g96kb1N7l7HD4PZ8_10PJoNak5EGlQFh9IVvJbOMLCGWpDGMAsCnKs4MdyCrhxDKkRlS1qSwnU_LRjXtZBlyS6y6T7XBr1Q6-iXOv6qoL3aPYQ4VzomXzeoiDGaW-oqoQW3VlYoJThpubHUloR1WTf7rHUMXxtsk1qETVx16ysQjIoCKCOdC_auOoa2jeiOUylR21rUrha1rUUdaumg6z3kEfEIVJyRgkv2B7WciQc</recordid><startdate>2021</startdate><enddate>2021</enddate><creator>Cohen, Robert A.</creator><creator>Choi, Hyomin</creator><creator>Bajic, Ivan V.</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SP</scope><scope>8FD</scope><scope>L7M</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0003-3154-5743</orcidid><orcidid>https://orcid.org/0000-0001-7724-8993</orcidid></search><sort><creationdate>2021</creationdate><title>Lightweight Compression of Intermediate Neural Network Features for Collaborative Intelligence</title><author>Cohen, Robert A. ; Choi, Hyomin ; Bajic, Ivan V.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c405t-96427f64c8fb32db1d28bb3d252ff940b4d2a9f3e1559d71706f072634ac58773</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Accuracy</topic><topic>Algorithms</topic><topic>Artificial neural networks</topic><topic>Cloud computing</topic><topic>Codec</topic><topic>Collaboration</topic><topic>Collaborative intelligence</topic><topic>deep learning</topic><topic>Design modifications</topic><topic>Entropy coding</topic><topic>Estimation</topic><topic>feature compression</topic><topic>Floating point arithmetic</topic><topic>Image coding</topic><topic>Intelligence</topic><topic>Lightweight</topic><topic>Mathematical analysis</topic><topic>Mathematical model</topic><topic>Mathematical models</topic><topic>Measurement</topic><topic>Mobile handsets</topic><topic>neural network compression</topic><topic>Neural networks</topic><topic>quantization</topic><topic>Quantization (signal)</topic><topic>Tensors</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Cohen, Robert A.</creatorcontrib><creatorcontrib>Choi, Hyomin</creatorcontrib><creatorcontrib>Bajic, Ivan V.</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IEEE open journal of circuits and systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Cohen, Robert A.</au><au>Choi, Hyomin</au><au>Bajic, Ivan V.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Lightweight Compression of Intermediate Neural Network Features for Collaborative Intelligence</atitle><jtitle>IEEE open journal of circuits and systems</jtitle><stitle>OJCAS</stitle><date>2021</date><risdate>2021</risdate><volume>2</volume><spage>350</spage><epage>362</epage><pages>350-362</pages><issn>2644-1225</issn><eissn>2644-1225</eissn><coden>IOJCC3</coden><abstract>In collaborative intelligence applications, part of a deep neural network (DNN) is deployed on a lightweight device such as a mobile phone or edge device, and the remaining portion of the DNN is processed where more computing resources are available, such as in the cloud. This paper presents a novel lightweight compression technique designed specifically to quantize and compress the features output by the intermediate layer of a split DNN, without requiring any retraining of the network weights. Mathematical models for estimating the clipping and quantization error of leaky-ReLU and ReLU activations at this intermediate layer are used to compute optimal clipping ranges for coarse quantization. A mathematical model for estimating the clipping and quantization error of leaky-ReLU activations at this intermediate layer is developed and used to compute optimal clipping ranges for coarse quantization. We also present a modified entropy-constrained design algorithm for quantizing clipped activations. When applied to popular object-detection and classification DNNs, we were able to compress the 32-bit floating point intermediate activations down to 0.6 to 0.8 bits, while keeping the loss in accuracy to less than 1%. When compared to HEVC, we found that the lightweight codec consistently provided better inference accuracy, by up to 1.3%. The performance and simplicity of this lightweight compression technique makes it an attractive option for coding an intermediate layer of a split neural network for edge/cloud applications.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/OJCAS.2021.3072884</doi><tpages>13</tpages><orcidid>https://orcid.org/0000-0003-3154-5743</orcidid><orcidid>https://orcid.org/0000-0001-7724-8993</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2644-1225
ispartof IEEE open journal of circuits and systems, 2021, Vol.2, p.350-362
issn 2644-1225
2644-1225
language eng
recordid cdi_crossref_primary_10_1109_OJCAS_2021_3072884
source IEEE Open Access Journals; DOAJ Directory of Open Access Journals; EZB-FREE-00999 freely available EZB journals
subjects Accuracy
Algorithms
Artificial neural networks
Cloud computing
Codec
Collaboration
Collaborative intelligence
deep learning
Design modifications
Entropy coding
Estimation
feature compression
Floating point arithmetic
Image coding
Intelligence
Lightweight
Mathematical analysis
Mathematical model
Mathematical models
Measurement
Mobile handsets
neural network compression
Neural networks
quantization
Quantization (signal)
Tensors
title Lightweight Compression of Intermediate Neural Network Features for Collaborative Intelligence
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-01T22%3A33%3A28IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Lightweight%20Compression%20of%20Intermediate%20Neural%20Network%20Features%20for%20Collaborative%20Intelligence&rft.jtitle=IEEE%20open%20journal%20of%20circuits%20and%20systems&rft.au=Cohen,%20Robert%20A.&rft.date=2021&rft.volume=2&rft.spage=350&rft.epage=362&rft.pages=350-362&rft.issn=2644-1225&rft.eissn=2644-1225&rft.coden=IOJCC3&rft_id=info:doi/10.1109/OJCAS.2021.3072884&rft_dat=%3Cproquest_cross%3E2531562130%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2531562130&rft_id=info:pmid/&rft_ieee_id=9430648&rft_doaj_id=oai_doaj_org_article_0bba4d1f95a54dd89e882f8d4bd1d703&rfr_iscdi=true