Real-Time Adaptive Partition and Resource Allocation for Multi-User End-Cloud Inference Collaboration in Mobile Environment

The deployment of Deep Neural Networks (DNNs) requires significant computational and storage resources, which is challenging for resource-constrained end devices. To this end, collaborative deep inference is proposed, in which the DNN is divided into two parts and executed on the end device and clou...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on mobile computing 2024-12, Vol.23 (12), p.13076-13094
Hauptverfasser:	Li, Yiran, Liu, Zhen, Kou, Ze, Wang, Yannan, Zhang, Guoqiang, Li, Yidong, Sun, Yongqi
Format:	Magazinearticle
Sprache:	eng
Schlagworte:	Adaptation models Artificial neural networks Cloud computing Collaboration Computational modeling DNN model partition Edge intelligence end-cloud collaborative inference Real-time systems resource allocation Resource management
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	13094
container_issue	12
container_start_page	13076
container_title	IEEE transactions on mobile computing
container_volume	23
creator	Li, Yiran Liu, Zhen Kou, Ze Wang, Yannan Zhang, Guoqiang Li, Yidong Sun, Yongqi
description	The deployment of Deep Neural Networks (DNNs) requires significant computational and storage resources, which is challenging for resource-constrained end devices. To this end, collaborative deep inference is proposed, in which the DNN is divided into two parts and executed on the end device and cloud respectively. The selection of DNN partition point is the key challenge to realize end-cloud collaborative deep inference, especially in mobile environments with unstable networks. In this paper, we propose a Real-time Adaptive Partition (RAP) framework, in which a fast split point decision algorithm is proposed to realize real-time adaptive DNN model partition in the mobile network. A weighted joint optimization of DNN quantization loss, inference and transmission latency is performed. We further propose a Joint Multi-user Model Partition and Resource Allocation (JM-MPRA) algorithm under RAP framework. JM-MPRA aims to guarantee the optimized latency, accuracy and resource utilization in the multi-user scene. Experimental evaluations have demonstrated the effectiveness of RAP with JM-MPRA in improving the performance of real-time end-cloud collaborative inference in both stable and unstable mobile networks. Compared with the state-of-the-art methods, the proposed approaches can achieve up to 5.06x decrease in inference latency and bring performance improvement of 1.52% in inference accuracy.
doi_str_mv	10.1109/TMC.2024.3430103
format	Magazinearticle
fullrecord	<record><control><sourceid>crossref_RIE</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TMC_2024_3430103</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10601493</ieee_id><sourcerecordid>10_1109_TMC_2024_3430103</sourcerecordid><originalsourceid>FETCH-LOGICAL-c147t-c74ecc59b74262fa832ed2d0f91959af4b7d9ab43c8d5d1263a4e208770788843</originalsourceid><addsrcrecordid>eNpNkE1LAzEQhoMoWKt3Dx7yB1LztZvkWJaqhRaltOclm8xCJN2U7LYg_nm31oOnGeZ93jk8CD0yOmOMmuftuppxyuVMSEEZFVdowopCE1qW9Pq8i5IwLsQtuuv7T0qZNkZN0PcGbCTbsAc89_YwhBPgD5uHMITUYdt5vIE-HbMb8xiTs7_3NmW8PsYhkF0PGS86T6qYjh4vuxYydCNdpRhtk_KlEDq8Tk2IMLKnkFO3h264RzetjT08_M0p2r0sttUbWb2_Lqv5ijgm1UCckuBcYRoleclbqwUHzz1tDTOFsa1slDe2kcJpX3jGS2ElcKqVokprLcUU0ctfl1PfZ2jrQw57m79qRuuzvHqUV5_l1X_yxsrTpRIA4B9eUiaNED8h9mxx</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>magazinearticle</recordtype></control><display><type>magazinearticle</type><title>Real-Time Adaptive Partition and Resource Allocation for Multi-User End-Cloud Inference Collaboration in Mobile Environment</title><source>IEEE Electronic Library (IEL)</source><creator>Li, Yiran ; Liu, Zhen ; Kou, Ze ; Wang, Yannan ; Zhang, Guoqiang ; Li, Yidong ; Sun, Yongqi</creator><creatorcontrib>Li, Yiran ; Liu, Zhen ; Kou, Ze ; Wang, Yannan ; Zhang, Guoqiang ; Li, Yidong ; Sun, Yongqi</creatorcontrib><description>The deployment of Deep Neural Networks (DNNs) requires significant computational and storage resources, which is challenging for resource-constrained end devices. To this end, collaborative deep inference is proposed, in which the DNN is divided into two parts and executed on the end device and cloud respectively. The selection of DNN partition point is the key challenge to realize end-cloud collaborative deep inference, especially in mobile environments with unstable networks. In this paper, we propose a Real-time Adaptive Partition (RAP) framework, in which a fast split point decision algorithm is proposed to realize real-time adaptive DNN model partition in the mobile network. A weighted joint optimization of DNN quantization loss, inference and transmission latency is performed. We further propose a Joint Multi-user Model Partition and Resource Allocation (JM-MPRA) algorithm under RAP framework. JM-MPRA aims to guarantee the optimized latency, accuracy and resource utilization in the multi-user scene. Experimental evaluations have demonstrated the effectiveness of RAP with JM-MPRA in improving the performance of real-time end-cloud collaborative inference in both stable and unstable mobile networks. Compared with the state-of-the-art methods, the proposed approaches can achieve up to 5.06x decrease in inference latency and bring performance improvement of 1.52% in inference accuracy.</description><identifier>ISSN: 1536-1233</identifier><identifier>EISSN: 1558-0660</identifier><identifier>DOI: 10.1109/TMC.2024.3430103</identifier><identifier>CODEN: ITMCCJ</identifier><language>eng</language><publisher>IEEE</publisher><subject>Adaptation models ; Artificial neural networks ; Cloud computing ; Collaboration ; Computational modeling ; DNN model partition ; Edge intelligence ; end-cloud collaborative inference ; Real-time systems ; resource allocation ; Resource management</subject><ispartof>IEEE transactions on mobile computing, 2024-12, Vol.23 (12), p.13076-13094</ispartof><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><orcidid>0009-0006-5356-7360 ; 0000-0002-8696-6898 ; 0009-0004-0964-0378 ; 0000-0003-2965-6196 ; 0000-0001-9452-606X ; 0000-0002-3556-8240</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10601493$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>776,780,792,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10601493$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Li, Yiran</creatorcontrib><creatorcontrib>Liu, Zhen</creatorcontrib><creatorcontrib>Kou, Ze</creatorcontrib><creatorcontrib>Wang, Yannan</creatorcontrib><creatorcontrib>Zhang, Guoqiang</creatorcontrib><creatorcontrib>Li, Yidong</creatorcontrib><creatorcontrib>Sun, Yongqi</creatorcontrib><title>Real-Time Adaptive Partition and Resource Allocation for Multi-User End-Cloud Inference Collaboration in Mobile Environment</title><title>IEEE transactions on mobile computing</title><addtitle>TMC</addtitle><description>The deployment of Deep Neural Networks (DNNs) requires significant computational and storage resources, which is challenging for resource-constrained end devices. To this end, collaborative deep inference is proposed, in which the DNN is divided into two parts and executed on the end device and cloud respectively. The selection of DNN partition point is the key challenge to realize end-cloud collaborative deep inference, especially in mobile environments with unstable networks. In this paper, we propose a Real-time Adaptive Partition (RAP) framework, in which a fast split point decision algorithm is proposed to realize real-time adaptive DNN model partition in the mobile network. A weighted joint optimization of DNN quantization loss, inference and transmission latency is performed. We further propose a Joint Multi-user Model Partition and Resource Allocation (JM-MPRA) algorithm under RAP framework. JM-MPRA aims to guarantee the optimized latency, accuracy and resource utilization in the multi-user scene. Experimental evaluations have demonstrated the effectiveness of RAP with JM-MPRA in improving the performance of real-time end-cloud collaborative inference in both stable and unstable mobile networks. Compared with the state-of-the-art methods, the proposed approaches can achieve up to 5.06x decrease in inference latency and bring performance improvement of 1.52% in inference accuracy.</description><subject>Adaptation models</subject><subject>Artificial neural networks</subject><subject>Cloud computing</subject><subject>Collaboration</subject><subject>Computational modeling</subject><subject>DNN model partition</subject><subject>Edge intelligence</subject><subject>end-cloud collaborative inference</subject><subject>Real-time systems</subject><subject>resource allocation</subject><subject>Resource management</subject><issn>1536-1233</issn><issn>1558-0660</issn><fulltext>true</fulltext><rsrctype>magazinearticle</rsrctype><creationdate>2024</creationdate><recordtype>magazinearticle</recordtype><sourceid>RIE</sourceid><recordid>eNpNkE1LAzEQhoMoWKt3Dx7yB1LztZvkWJaqhRaltOclm8xCJN2U7LYg_nm31oOnGeZ93jk8CD0yOmOMmuftuppxyuVMSEEZFVdowopCE1qW9Pq8i5IwLsQtuuv7T0qZNkZN0PcGbCTbsAc89_YwhBPgD5uHMITUYdt5vIE-HbMb8xiTs7_3NmW8PsYhkF0PGS86T6qYjh4vuxYydCNdpRhtk_KlEDq8Tk2IMLKnkFO3h264RzetjT08_M0p2r0sttUbWb2_Lqv5ijgm1UCckuBcYRoleclbqwUHzz1tDTOFsa1slDe2kcJpX3jGS2ElcKqVokprLcUU0ctfl1PfZ2jrQw57m79qRuuzvHqUV5_l1X_yxsrTpRIA4B9eUiaNED8h9mxx</recordid><startdate>202412</startdate><enddate>202412</enddate><creator>Li, Yiran</creator><creator>Liu, Zhen</creator><creator>Kou, Ze</creator><creator>Wang, Yannan</creator><creator>Zhang, Guoqiang</creator><creator>Li, Yidong</creator><creator>Sun, Yongqi</creator><general>IEEE</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0009-0006-5356-7360</orcidid><orcidid>https://orcid.org/0000-0002-8696-6898</orcidid><orcidid>https://orcid.org/0009-0004-0964-0378</orcidid><orcidid>https://orcid.org/0000-0003-2965-6196</orcidid><orcidid>https://orcid.org/0000-0001-9452-606X</orcidid><orcidid>https://orcid.org/0000-0002-3556-8240</orcidid></search><sort><creationdate>202412</creationdate><title>Real-Time Adaptive Partition and Resource Allocation for Multi-User End-Cloud Inference Collaboration in Mobile Environment</title><author>Li, Yiran ; Liu, Zhen ; Kou, Ze ; Wang, Yannan ; Zhang, Guoqiang ; Li, Yidong ; Sun, Yongqi</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c147t-c74ecc59b74262fa832ed2d0f91959af4b7d9ab43c8d5d1263a4e208770788843</frbrgroupid><rsrctype>magazinearticle</rsrctype><prefilter>magazinearticle</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Adaptation models</topic><topic>Artificial neural networks</topic><topic>Cloud computing</topic><topic>Collaboration</topic><topic>Computational modeling</topic><topic>DNN model partition</topic><topic>Edge intelligence</topic><topic>end-cloud collaborative inference</topic><topic>Real-time systems</topic><topic>resource allocation</topic><topic>Resource management</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Li, Yiran</creatorcontrib><creatorcontrib>Liu, Zhen</creatorcontrib><creatorcontrib>Kou, Ze</creatorcontrib><creatorcontrib>Wang, Yannan</creatorcontrib><creatorcontrib>Zhang, Guoqiang</creatorcontrib><creatorcontrib>Li, Yidong</creatorcontrib><creatorcontrib>Sun, Yongqi</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><jtitle>IEEE transactions on mobile computing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Li, Yiran</au><au>Liu, Zhen</au><au>Kou, Ze</au><au>Wang, Yannan</au><au>Zhang, Guoqiang</au><au>Li, Yidong</au><au>Sun, Yongqi</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Real-Time Adaptive Partition and Resource Allocation for Multi-User End-Cloud Inference Collaboration in Mobile Environment</atitle><jtitle>IEEE transactions on mobile computing</jtitle><stitle>TMC</stitle><date>2024-12</date><risdate>2024</risdate><volume>23</volume><issue>12</issue><spage>13076</spage><epage>13094</epage><pages>13076-13094</pages><issn>1536-1233</issn><eissn>1558-0660</eissn><coden>ITMCCJ</coden><abstract>The deployment of Deep Neural Networks (DNNs) requires significant computational and storage resources, which is challenging for resource-constrained end devices. To this end, collaborative deep inference is proposed, in which the DNN is divided into two parts and executed on the end device and cloud respectively. The selection of DNN partition point is the key challenge to realize end-cloud collaborative deep inference, especially in mobile environments with unstable networks. In this paper, we propose a Real-time Adaptive Partition (RAP) framework, in which a fast split point decision algorithm is proposed to realize real-time adaptive DNN model partition in the mobile network. A weighted joint optimization of DNN quantization loss, inference and transmission latency is performed. We further propose a Joint Multi-user Model Partition and Resource Allocation (JM-MPRA) algorithm under RAP framework. JM-MPRA aims to guarantee the optimized latency, accuracy and resource utilization in the multi-user scene. Experimental evaluations have demonstrated the effectiveness of RAP with JM-MPRA in improving the performance of real-time end-cloud collaborative inference in both stable and unstable mobile networks. Compared with the state-of-the-art methods, the proposed approaches can achieve up to 5.06x decrease in inference latency and bring performance improvement of 1.52% in inference accuracy.</abstract><pub>IEEE</pub><doi>10.1109/TMC.2024.3430103</doi><tpages>19</tpages><orcidid>https://orcid.org/0009-0006-5356-7360</orcidid><orcidid>https://orcid.org/0000-0002-8696-6898</orcidid><orcidid>https://orcid.org/0009-0004-0964-0378</orcidid><orcidid>https://orcid.org/0000-0003-2965-6196</orcidid><orcidid>https://orcid.org/0000-0001-9452-606X</orcidid><orcidid>https://orcid.org/0000-0002-3556-8240</orcidid></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 1536-1233
ispartof	IEEE transactions on mobile computing, 2024-12, Vol.23 (12), p.13076-13094
issn	1536-1233 1558-0660
language	eng
recordid	cdi_crossref_primary_10_1109_TMC_2024_3430103
source	IEEE Electronic Library (IEL)
subjects	Adaptation models Artificial neural networks Cloud computing Collaboration Computational modeling DNN model partition Edge intelligence end-cloud collaborative inference Real-time systems resource allocation Resource management
title	Real-Time Adaptive Partition and Resource Allocation for Multi-User End-Cloud Inference Collaboration in Mobile Environment
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-10T07%3A23%3A29IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Real-Time%20Adaptive%20Partition%20and%20Resource%20Allocation%20for%20Multi-User%20End-Cloud%20Inference%20Collaboration%20in%20Mobile%20Environment&rft.jtitle=IEEE%20transactions%20on%20mobile%20computing&rft.au=Li,%20Yiran&rft.date=2024-12&rft.volume=23&rft.issue=12&rft.spage=13076&rft.epage=13094&rft.pages=13076-13094&rft.issn=1536-1233&rft.eissn=1558-0660&rft.coden=ITMCCJ&rft_id=info:doi/10.1109/TMC.2024.3430103&rft_dat=%3Ccrossref_RIE%3E10_1109_TMC_2024_3430103%3C/crossref_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=10601493&rfr_iscdi=true