DNN Surgery: Accelerating DNN Inference on the Edge Through Layer Partitioning

Recent advances in deep neural networks have substantially improved the accuracy and speed of various intelligent applications. Nevertheless, one obstacle is that DNN inference imposes a heavy computation burden on end devices, but offloading inference tasks to the cloud causes a large volume of dat...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on cloud computing 2023-07, Vol.11 (3), p.3111-3125
Hauptverfasser:	Liang, Huanghuang, Sang, Qianlong, Hu, Chuang, Cheng, Dazhao, Zhou, Xiaobo, Wang, Dan, Bao, Wei, Wang, Yu
Format:	Artikel
Sprache:	eng
Schlagworte:	Artificial neural networks Cloud computing Computation offloading Data transmission Deep learning deep neural networks Delays edge computing Inference inference acceleration layer partitioning Network latency Neural networks Surgery Throughput Visual analytics
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	3125
container_issue	3
container_start_page	3111
container_title	IEEE transactions on cloud computing
container_volume	11
creator	Liang, Huanghuang Sang, Qianlong Hu, Chuang Cheng, Dazhao Zhou, Xiaobo Wang, Dan Bao, Wei Wang, Yu
description	Recent advances in deep neural networks have substantially improved the accuracy and speed of various intelligent applications. Nevertheless, one obstacle is that DNN inference imposes a heavy computation burden on end devices, but offloading inference tasks to the cloud causes a large volume of data transmission. Motivated by the fact that the data size of some intermediate DNN layers is significantly smaller than that of raw input data, we designed the DNN surgery, which allows partitioned DNN to be processed at both the edge and cloud while limiting the data transmission. The challenge is twofold: (1) Network dynamics substantially influence the performance of DNN partition, and (2) State-of-the-art DNNs are characterized by a directed acyclic graph rather than a chain, so that partition is incredibly complicated. To solve the issues, We design a Dynamic Adaptive DNN Surgery(DADS) scheme, which optimally partitions the DNN under different network conditions. We also study the partition problem under the cost-constrained system, where the resource of the cloud for inference is limited. Then, a real-world prototype based on the selif-driving car video dataset is implemented, showing that compared with current approaches, DNN surgery can improve latency up to 6.45 times and improve throughput up to 8.31 times. We further evaluate DNN surgery through two case studies where we use DNN surgery to support an indoor intrusion detection application and a campus traffic monitor application, and DNN surgery shows consistently high throughput and low latency.
doi_str_mv	10.1109/TCC.2023.3258982
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TCC_2023_3258982</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10076802</ieee_id><sourcerecordid>2861459982</sourcerecordid><originalsourceid>FETCH-LOGICAL-c292t-44ccb71f9d6686965ad8c46e8f42e24cb180a41a56c4a66950f5f2597ae751bd3</originalsourceid><addsrcrecordid>eNpNkDFPwzAQRi0EElXpzsBgiTnF59iOzVaFApWqgkSZLde5pKlKUpxk6L_HVTtwy51077uTHiH3wKYAzDyt83zKGU-nKZfaaH5FRjzNeMIY6Os4g9JJBgpuyaTrdiyWlmDAjMjqZbWiX0OoMByf6cx73GNwfd1U9LRZNCUGbDzStqH9Fum8qJCut6Edqi1duiMG-ulCX_d128TQHbkp3b7DyaWPyffrfJ2_J8uPt0U-WyaeG94nQni_yaA0hVJaGSVdob1QqEvBkQu_Ac2cACeVF04pI1kpSy5N5jCTsCnSMXk83z2E9nfArre7dghNfGm5ViCkiRYixc6UD23XBSztIdQ_LhwtMHsSZ6M4exJnL-Ji5OEcqRHxH84ypSP3Bx1-Z4E</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2861459982</pqid></control><display><type>article</type><title>DNN Surgery: Accelerating DNN Inference on the Edge Through Layer Partitioning</title><source>IEEE Electronic Library (IEL)</source><creator>Liang, Huanghuang ; Sang, Qianlong ; Hu, Chuang ; Cheng, Dazhao ; Zhou, Xiaobo ; Wang, Dan ; Bao, Wei ; Wang, Yu</creator><creatorcontrib>Liang, Huanghuang ; Sang, Qianlong ; Hu, Chuang ; Cheng, Dazhao ; Zhou, Xiaobo ; Wang, Dan ; Bao, Wei ; Wang, Yu</creatorcontrib><description>Recent advances in deep neural networks have substantially improved the accuracy and speed of various intelligent applications. Nevertheless, one obstacle is that DNN inference imposes a heavy computation burden on end devices, but offloading inference tasks to the cloud causes a large volume of data transmission. Motivated by the fact that the data size of some intermediate DNN layers is significantly smaller than that of raw input data, we designed the DNN surgery, which allows partitioned DNN to be processed at both the edge and cloud while limiting the data transmission. The challenge is twofold: (1) Network dynamics substantially influence the performance of DNN partition, and (2) State-of-the-art DNNs are characterized by a directed acyclic graph rather than a chain, so that partition is incredibly complicated. To solve the issues, We design a Dynamic Adaptive DNN Surgery(DADS) scheme, which optimally partitions the DNN under different network conditions. We also study the partition problem under the cost-constrained system, where the resource of the cloud for inference is limited. Then, a real-world prototype based on the selif-driving car video dataset is implemented, showing that compared with current approaches, DNN surgery can improve latency up to 6.45 times and improve throughput up to 8.31 times. We further evaluate DNN surgery through two case studies where we use DNN surgery to support an indoor intrusion detection application and a campus traffic monitor application, and DNN surgery shows consistently high throughput and low latency.</description><identifier>ISSN: 2168-7161</identifier><identifier>EISSN: 2372-0018</identifier><identifier>DOI: 10.1109/TCC.2023.3258982</identifier><identifier>CODEN: ITCCF6</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Artificial neural networks ; Cloud computing ; Computation offloading ; Data transmission ; Deep learning ; deep neural networks ; Delays ; edge computing ; Inference ; inference acceleration ; layer partitioning ; Network latency ; Neural networks ; Surgery ; Throughput ; Visual analytics</subject><ispartof>IEEE transactions on cloud computing, 2023-07, Vol.11 (3), p.3111-3125</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c292t-44ccb71f9d6686965ad8c46e8f42e24cb180a41a56c4a66950f5f2597ae751bd3</citedby><cites>FETCH-LOGICAL-c292t-44ccb71f9d6686965ad8c46e8f42e24cb180a41a56c4a66950f5f2597ae751bd3</cites><orcidid>0000-0002-9051-3242 ; 0000-0003-2869-7623 ; 0009-0004-9500-3390 ; 0000-0003-3511-0288 ; 0000-0003-2847-0285 ; 0009-0005-1563-9434</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10076802$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10076802$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Liang, Huanghuang</creatorcontrib><creatorcontrib>Sang, Qianlong</creatorcontrib><creatorcontrib>Hu, Chuang</creatorcontrib><creatorcontrib>Cheng, Dazhao</creatorcontrib><creatorcontrib>Zhou, Xiaobo</creatorcontrib><creatorcontrib>Wang, Dan</creatorcontrib><creatorcontrib>Bao, Wei</creatorcontrib><creatorcontrib>Wang, Yu</creatorcontrib><title>DNN Surgery: Accelerating DNN Inference on the Edge Through Layer Partitioning</title><title>IEEE transactions on cloud computing</title><addtitle>TCC</addtitle><description>Recent advances in deep neural networks have substantially improved the accuracy and speed of various intelligent applications. Nevertheless, one obstacle is that DNN inference imposes a heavy computation burden on end devices, but offloading inference tasks to the cloud causes a large volume of data transmission. Motivated by the fact that the data size of some intermediate DNN layers is significantly smaller than that of raw input data, we designed the DNN surgery, which allows partitioned DNN to be processed at both the edge and cloud while limiting the data transmission. The challenge is twofold: (1) Network dynamics substantially influence the performance of DNN partition, and (2) State-of-the-art DNNs are characterized by a directed acyclic graph rather than a chain, so that partition is incredibly complicated. To solve the issues, We design a Dynamic Adaptive DNN Surgery(DADS) scheme, which optimally partitions the DNN under different network conditions. We also study the partition problem under the cost-constrained system, where the resource of the cloud for inference is limited. Then, a real-world prototype based on the selif-driving car video dataset is implemented, showing that compared with current approaches, DNN surgery can improve latency up to 6.45 times and improve throughput up to 8.31 times. We further evaluate DNN surgery through two case studies where we use DNN surgery to support an indoor intrusion detection application and a campus traffic monitor application, and DNN surgery shows consistently high throughput and low latency.</description><subject>Artificial neural networks</subject><subject>Cloud computing</subject><subject>Computation offloading</subject><subject>Data transmission</subject><subject>Deep learning</subject><subject>deep neural networks</subject><subject>Delays</subject><subject>edge computing</subject><subject>Inference</subject><subject>inference acceleration</subject><subject>layer partitioning</subject><subject>Network latency</subject><subject>Neural networks</subject><subject>Surgery</subject><subject>Throughput</subject><subject>Visual analytics</subject><issn>2168-7161</issn><issn>2372-0018</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNkDFPwzAQRi0EElXpzsBgiTnF59iOzVaFApWqgkSZLde5pKlKUpxk6L_HVTtwy51077uTHiH3wKYAzDyt83zKGU-nKZfaaH5FRjzNeMIY6Os4g9JJBgpuyaTrdiyWlmDAjMjqZbWiX0OoMByf6cx73GNwfd1U9LRZNCUGbDzStqH9Fum8qJCut6Edqi1duiMG-ulCX_d128TQHbkp3b7DyaWPyffrfJ2_J8uPt0U-WyaeG94nQni_yaA0hVJaGSVdob1QqEvBkQu_Ac2cACeVF04pI1kpSy5N5jCTsCnSMXk83z2E9nfArre7dghNfGm5ViCkiRYixc6UD23XBSztIdQ_LhwtMHsSZ6M4exJnL-Ji5OEcqRHxH84ypSP3Bx1-Z4E</recordid><startdate>20230701</startdate><enddate>20230701</enddate><creator>Liang, Huanghuang</creator><creator>Sang, Qianlong</creator><creator>Hu, Chuang</creator><creator>Cheng, Dazhao</creator><creator>Zhou, Xiaobo</creator><creator>Wang, Dan</creator><creator>Bao, Wei</creator><creator>Wang, Yu</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-9051-3242</orcidid><orcidid>https://orcid.org/0000-0003-2869-7623</orcidid><orcidid>https://orcid.org/0009-0004-9500-3390</orcidid><orcidid>https://orcid.org/0000-0003-3511-0288</orcidid><orcidid>https://orcid.org/0000-0003-2847-0285</orcidid><orcidid>https://orcid.org/0009-0005-1563-9434</orcidid></search><sort><creationdate>20230701</creationdate><title>DNN Surgery: Accelerating DNN Inference on the Edge Through Layer Partitioning</title><author>Liang, Huanghuang ; Sang, Qianlong ; Hu, Chuang ; Cheng, Dazhao ; Zhou, Xiaobo ; Wang, Dan ; Bao, Wei ; Wang, Yu</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c292t-44ccb71f9d6686965ad8c46e8f42e24cb180a41a56c4a66950f5f2597ae751bd3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Artificial neural networks</topic><topic>Cloud computing</topic><topic>Computation offloading</topic><topic>Data transmission</topic><topic>Deep learning</topic><topic>deep neural networks</topic><topic>Delays</topic><topic>edge computing</topic><topic>Inference</topic><topic>inference acceleration</topic><topic>layer partitioning</topic><topic>Network latency</topic><topic>Neural networks</topic><topic>Surgery</topic><topic>Throughput</topic><topic>Visual analytics</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Liang, Huanghuang</creatorcontrib><creatorcontrib>Sang, Qianlong</creatorcontrib><creatorcontrib>Hu, Chuang</creatorcontrib><creatorcontrib>Cheng, Dazhao</creatorcontrib><creatorcontrib>Zhou, Xiaobo</creatorcontrib><creatorcontrib>Wang, Dan</creatorcontrib><creatorcontrib>Bao, Wei</creatorcontrib><creatorcontrib>Wang, Yu</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on cloud computing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Liang, Huanghuang</au><au>Sang, Qianlong</au><au>Hu, Chuang</au><au>Cheng, Dazhao</au><au>Zhou, Xiaobo</au><au>Wang, Dan</au><au>Bao, Wei</au><au>Wang, Yu</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>DNN Surgery: Accelerating DNN Inference on the Edge Through Layer Partitioning</atitle><jtitle>IEEE transactions on cloud computing</jtitle><stitle>TCC</stitle><date>2023-07-01</date><risdate>2023</risdate><volume>11</volume><issue>3</issue><spage>3111</spage><epage>3125</epage><pages>3111-3125</pages><issn>2168-7161</issn><eissn>2372-0018</eissn><coden>ITCCF6</coden><abstract>Recent advances in deep neural networks have substantially improved the accuracy and speed of various intelligent applications. Nevertheless, one obstacle is that DNN inference imposes a heavy computation burden on end devices, but offloading inference tasks to the cloud causes a large volume of data transmission. Motivated by the fact that the data size of some intermediate DNN layers is significantly smaller than that of raw input data, we designed the DNN surgery, which allows partitioned DNN to be processed at both the edge and cloud while limiting the data transmission. The challenge is twofold: (1) Network dynamics substantially influence the performance of DNN partition, and (2) State-of-the-art DNNs are characterized by a directed acyclic graph rather than a chain, so that partition is incredibly complicated. To solve the issues, We design a Dynamic Adaptive DNN Surgery(DADS) scheme, which optimally partitions the DNN under different network conditions. We also study the partition problem under the cost-constrained system, where the resource of the cloud for inference is limited. Then, a real-world prototype based on the selif-driving car video dataset is implemented, showing that compared with current approaches, DNN surgery can improve latency up to 6.45 times and improve throughput up to 8.31 times. We further evaluate DNN surgery through two case studies where we use DNN surgery to support an indoor intrusion detection application and a campus traffic monitor application, and DNN surgery shows consistently high throughput and low latency.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/TCC.2023.3258982</doi><tpages>15</tpages><orcidid>https://orcid.org/0000-0002-9051-3242</orcidid><orcidid>https://orcid.org/0000-0003-2869-7623</orcidid><orcidid>https://orcid.org/0009-0004-9500-3390</orcidid><orcidid>https://orcid.org/0000-0003-3511-0288</orcidid><orcidid>https://orcid.org/0000-0003-2847-0285</orcidid><orcidid>https://orcid.org/0009-0005-1563-9434</orcidid></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 2168-7161
ispartof	IEEE transactions on cloud computing, 2023-07, Vol.11 (3), p.3111-3125
issn	2168-7161 2372-0018
language	eng
recordid	cdi_crossref_primary_10_1109_TCC_2023_3258982
source	IEEE Electronic Library (IEL)
subjects	Artificial neural networks Cloud computing Computation offloading Data transmission Deep learning deep neural networks Delays edge computing Inference inference acceleration layer partitioning Network latency Neural networks Surgery Throughput Visual analytics
title	DNN Surgery: Accelerating DNN Inference on the Edge Through Layer Partitioning
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-21T16%3A08%3A17IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=DNN%20Surgery:%20Accelerating%20DNN%20Inference%20on%20the%20Edge%20Through%20Layer%20Partitioning&rft.jtitle=IEEE%20transactions%20on%20cloud%20computing&rft.au=Liang,%20Huanghuang&rft.date=2023-07-01&rft.volume=11&rft.issue=3&rft.spage=3111&rft.epage=3125&rft.pages=3111-3125&rft.issn=2168-7161&rft.eissn=2372-0018&rft.coden=ITCCF6&rft_id=info:doi/10.1109/TCC.2023.3258982&rft_dat=%3Cproquest_RIE%3E2861459982%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2861459982&rft_id=info:pmid/&rft_ieee_id=10076802&rfr_iscdi=true