Real-Time Adaptive Partition and Resource Allocation for Multi-User End-Cloud Inference Collaboration in Mobile Environment

The deployment of Deep Neural Networks (DNNs) requires significant computational and storage resources, which is challenging for resource-constrained end devices. To this end, collaborative deep inference is proposed, in which the DNN is divided into two parts and executed on the end device and clou...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on mobile computing 2024-12, Vol.23 (12), p.13076-13094
Hauptverfasser: Li, Yiran, Liu, Zhen, Kou, Ze, Wang, Yannan, Zhang, Guoqiang, Li, Yidong, Sun, Yongqi
Format: Magazinearticle
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 13094
container_issue 12
container_start_page 13076
container_title IEEE transactions on mobile computing
container_volume 23
creator Li, Yiran
Liu, Zhen
Kou, Ze
Wang, Yannan
Zhang, Guoqiang
Li, Yidong
Sun, Yongqi
description The deployment of Deep Neural Networks (DNNs) requires significant computational and storage resources, which is challenging for resource-constrained end devices. To this end, collaborative deep inference is proposed, in which the DNN is divided into two parts and executed on the end device and cloud respectively. The selection of DNN partition point is the key challenge to realize end-cloud collaborative deep inference, especially in mobile environments with unstable networks. In this paper, we propose a Real-time Adaptive Partition (RAP) framework, in which a fast split point decision algorithm is proposed to realize real-time adaptive DNN model partition in the mobile network. A weighted joint optimization of DNN quantization loss, inference and transmission latency is performed. We further propose a Joint Multi-user Model Partition and Resource Allocation (JM-MPRA) algorithm under RAP framework. JM-MPRA aims to guarantee the optimized latency, accuracy and resource utilization in the multi-user scene. Experimental evaluations have demonstrated the effectiveness of RAP with JM-MPRA in improving the performance of real-time end-cloud collaborative inference in both stable and unstable mobile networks. Compared with the state-of-the-art methods, the proposed approaches can achieve up to 5.06x decrease in inference latency and bring performance improvement of 1.52% in inference accuracy.
doi_str_mv 10.1109/TMC.2024.3430103
format Magazinearticle
fullrecord <record><control><sourceid>crossref_RIE</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TMC_2024_3430103</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10601493</ieee_id><sourcerecordid>10_1109_TMC_2024_3430103</sourcerecordid><originalsourceid>FETCH-LOGICAL-c147t-c74ecc59b74262fa832ed2d0f91959af4b7d9ab43c8d5d1263a4e208770788843</originalsourceid><addsrcrecordid>eNpNkE1LAzEQhoMoWKt3Dx7yB1LztZvkWJaqhRaltOclm8xCJN2U7LYg_nm31oOnGeZ93jk8CD0yOmOMmuftuppxyuVMSEEZFVdowopCE1qW9Pq8i5IwLsQtuuv7T0qZNkZN0PcGbCTbsAc89_YwhBPgD5uHMITUYdt5vIE-HbMb8xiTs7_3NmW8PsYhkF0PGS86T6qYjh4vuxYydCNdpRhtk_KlEDq8Tk2IMLKnkFO3h264RzetjT08_M0p2r0sttUbWb2_Lqv5ijgm1UCckuBcYRoleclbqwUHzz1tDTOFsa1slDe2kcJpX3jGS2ElcKqVokprLcUU0ctfl1PfZ2jrQw57m79qRuuzvHqUV5_l1X_yxsrTpRIA4B9eUiaNED8h9mxx</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>magazinearticle</recordtype></control><display><type>magazinearticle</type><title>Real-Time Adaptive Partition and Resource Allocation for Multi-User End-Cloud Inference Collaboration in Mobile Environment</title><source>IEEE Electronic Library (IEL)</source><creator>Li, Yiran ; Liu, Zhen ; Kou, Ze ; Wang, Yannan ; Zhang, Guoqiang ; Li, Yidong ; Sun, Yongqi</creator><creatorcontrib>Li, Yiran ; Liu, Zhen ; Kou, Ze ; Wang, Yannan ; Zhang, Guoqiang ; Li, Yidong ; Sun, Yongqi</creatorcontrib><description>The deployment of Deep Neural Networks (DNNs) requires significant computational and storage resources, which is challenging for resource-constrained end devices. To this end, collaborative deep inference is proposed, in which the DNN is divided into two parts and executed on the end device and cloud respectively. The selection of DNN partition point is the key challenge to realize end-cloud collaborative deep inference, especially in mobile environments with unstable networks. In this paper, we propose a Real-time Adaptive Partition (RAP) framework, in which a fast split point decision algorithm is proposed to realize real-time adaptive DNN model partition in the mobile network. A weighted joint optimization of DNN quantization loss, inference and transmission latency is performed. We further propose a Joint Multi-user Model Partition and Resource Allocation (JM-MPRA) algorithm under RAP framework. JM-MPRA aims to guarantee the optimized latency, accuracy and resource utilization in the multi-user scene. Experimental evaluations have demonstrated the effectiveness of RAP with JM-MPRA in improving the performance of real-time end-cloud collaborative inference in both stable and unstable mobile networks. Compared with the state-of-the-art methods, the proposed approaches can achieve up to 5.06x decrease in inference latency and bring performance improvement of 1.52% in inference accuracy.</description><identifier>ISSN: 1536-1233</identifier><identifier>EISSN: 1558-0660</identifier><identifier>DOI: 10.1109/TMC.2024.3430103</identifier><identifier>CODEN: ITMCCJ</identifier><language>eng</language><publisher>IEEE</publisher><subject>Adaptation models ; Artificial neural networks ; Cloud computing ; Collaboration ; Computational modeling ; DNN model partition ; Edge intelligence ; end-cloud collaborative inference ; Real-time systems ; resource allocation ; Resource management</subject><ispartof>IEEE transactions on mobile computing, 2024-12, Vol.23 (12), p.13076-13094</ispartof><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><orcidid>0009-0006-5356-7360 ; 0000-0002-8696-6898 ; 0009-0004-0964-0378 ; 0000-0003-2965-6196 ; 0000-0001-9452-606X ; 0000-0002-3556-8240</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10601493$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>776,780,792,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10601493$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Li, Yiran</creatorcontrib><creatorcontrib>Liu, Zhen</creatorcontrib><creatorcontrib>Kou, Ze</creatorcontrib><creatorcontrib>Wang, Yannan</creatorcontrib><creatorcontrib>Zhang, Guoqiang</creatorcontrib><creatorcontrib>Li, Yidong</creatorcontrib><creatorcontrib>Sun, Yongqi</creatorcontrib><title>Real-Time Adaptive Partition and Resource Allocation for Multi-User End-Cloud Inference Collaboration in Mobile Environment</title><title>IEEE transactions on mobile computing</title><addtitle>TMC</addtitle><description>The deployment of Deep Neural Networks (DNNs) requires significant computational and storage resources, which is challenging for resource-constrained end devices. To this end, collaborative deep inference is proposed, in which the DNN is divided into two parts and executed on the end device and cloud respectively. The selection of DNN partition point is the key challenge to realize end-cloud collaborative deep inference, especially in mobile environments with unstable networks. In this paper, we propose a Real-time Adaptive Partition (RAP) framework, in which a fast split point decision algorithm is proposed to realize real-time adaptive DNN model partition in the mobile network. A weighted joint optimization of DNN quantization loss, inference and transmission latency is performed. We further propose a Joint Multi-user Model Partition and Resource Allocation (JM-MPRA) algorithm under RAP framework. JM-MPRA aims to guarantee the optimized latency, accuracy and resource utilization in the multi-user scene. Experimental evaluations have demonstrated the effectiveness of RAP with JM-MPRA in improving the performance of real-time end-cloud collaborative inference in both stable and unstable mobile networks. Compared with the state-of-the-art methods, the proposed approaches can achieve up to 5.06x decrease in inference latency and bring performance improvement of 1.52% in inference accuracy.</description><subject>Adaptation models</subject><subject>Artificial neural networks</subject><subject>Cloud computing</subject><subject>Collaboration</subject><subject>Computational modeling</subject><subject>DNN model partition</subject><subject>Edge intelligence</subject><subject>end-cloud collaborative inference</subject><subject>Real-time systems</subject><subject>resource allocation</subject><subject>Resource management</subject><issn>1536-1233</issn><issn>1558-0660</issn><fulltext>true</fulltext><rsrctype>magazinearticle</rsrctype><creationdate>2024</creationdate><recordtype>magazinearticle</recordtype><sourceid>RIE</sourceid><recordid>eNpNkE1LAzEQhoMoWKt3Dx7yB1LztZvkWJaqhRaltOclm8xCJN2U7LYg_nm31oOnGeZ93jk8CD0yOmOMmuftuppxyuVMSEEZFVdowopCE1qW9Pq8i5IwLsQtuuv7T0qZNkZN0PcGbCTbsAc89_YwhBPgD5uHMITUYdt5vIE-HbMb8xiTs7_3NmW8PsYhkF0PGS86T6qYjh4vuxYydCNdpRhtk_KlEDq8Tk2IMLKnkFO3h264RzetjT08_M0p2r0sttUbWb2_Lqv5ijgm1UCckuBcYRoleclbqwUHzz1tDTOFsa1slDe2kcJpX3jGS2ElcKqVokprLcUU0ctfl1PfZ2jrQw57m79qRuuzvHqUV5_l1X_yxsrTpRIA4B9eUiaNED8h9mxx</recordid><startdate>202412</startdate><enddate>202412</enddate><creator>Li, Yiran</creator><creator>Liu, Zhen</creator><creator>Kou, Ze</creator><creator>Wang, Yannan</creator><creator>Zhang, Guoqiang</creator><creator>Li, Yidong</creator><creator>Sun, Yongqi</creator><general>IEEE</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0009-0006-5356-7360</orcidid><orcidid>https://orcid.org/0000-0002-8696-6898</orcidid><orcidid>https://orcid.org/0009-0004-0964-0378</orcidid><orcidid>https://orcid.org/0000-0003-2965-6196</orcidid><orcidid>https://orcid.org/0000-0001-9452-606X</orcidid><orcidid>https://orcid.org/0000-0002-3556-8240</orcidid></search><sort><creationdate>202412</creationdate><title>Real-Time Adaptive Partition and Resource Allocation for Multi-User End-Cloud Inference Collaboration in Mobile Environment</title><author>Li, Yiran ; Liu, Zhen ; Kou, Ze ; Wang, Yannan ; Zhang, Guoqiang ; Li, Yidong ; Sun, Yongqi</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c147t-c74ecc59b74262fa832ed2d0f91959af4b7d9ab43c8d5d1263a4e208770788843</frbrgroupid><rsrctype>magazinearticle</rsrctype><prefilter>magazinearticle</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Adaptation models</topic><topic>Artificial neural networks</topic><topic>Cloud computing</topic><topic>Collaboration</topic><topic>Computational modeling</topic><topic>DNN model partition</topic><topic>Edge intelligence</topic><topic>end-cloud collaborative inference</topic><topic>Real-time systems</topic><topic>resource allocation</topic><topic>Resource management</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Li, Yiran</creatorcontrib><creatorcontrib>Liu, Zhen</creatorcontrib><creatorcontrib>Kou, Ze</creatorcontrib><creatorcontrib>Wang, Yannan</creatorcontrib><creatorcontrib>Zhang, Guoqiang</creatorcontrib><creatorcontrib>Li, Yidong</creatorcontrib><creatorcontrib>Sun, Yongqi</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><jtitle>IEEE transactions on mobile computing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Li, Yiran</au><au>Liu, Zhen</au><au>Kou, Ze</au><au>Wang, Yannan</au><au>Zhang, Guoqiang</au><au>Li, Yidong</au><au>Sun, Yongqi</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Real-Time Adaptive Partition and Resource Allocation for Multi-User End-Cloud Inference Collaboration in Mobile Environment</atitle><jtitle>IEEE transactions on mobile computing</jtitle><stitle>TMC</stitle><date>2024-12</date><risdate>2024</risdate><volume>23</volume><issue>12</issue><spage>13076</spage><epage>13094</epage><pages>13076-13094</pages><issn>1536-1233</issn><eissn>1558-0660</eissn><coden>ITMCCJ</coden><abstract>The deployment of Deep Neural Networks (DNNs) requires significant computational and storage resources, which is challenging for resource-constrained end devices. To this end, collaborative deep inference is proposed, in which the DNN is divided into two parts and executed on the end device and cloud respectively. The selection of DNN partition point is the key challenge to realize end-cloud collaborative deep inference, especially in mobile environments with unstable networks. In this paper, we propose a Real-time Adaptive Partition (RAP) framework, in which a fast split point decision algorithm is proposed to realize real-time adaptive DNN model partition in the mobile network. A weighted joint optimization of DNN quantization loss, inference and transmission latency is performed. We further propose a Joint Multi-user Model Partition and Resource Allocation (JM-MPRA) algorithm under RAP framework. JM-MPRA aims to guarantee the optimized latency, accuracy and resource utilization in the multi-user scene. Experimental evaluations have demonstrated the effectiveness of RAP with JM-MPRA in improving the performance of real-time end-cloud collaborative inference in both stable and unstable mobile networks. Compared with the state-of-the-art methods, the proposed approaches can achieve up to 5.06x decrease in inference latency and bring performance improvement of 1.52% in inference accuracy.</abstract><pub>IEEE</pub><doi>10.1109/TMC.2024.3430103</doi><tpages>19</tpages><orcidid>https://orcid.org/0009-0006-5356-7360</orcidid><orcidid>https://orcid.org/0000-0002-8696-6898</orcidid><orcidid>https://orcid.org/0009-0004-0964-0378</orcidid><orcidid>https://orcid.org/0000-0003-2965-6196</orcidid><orcidid>https://orcid.org/0000-0001-9452-606X</orcidid><orcidid>https://orcid.org/0000-0002-3556-8240</orcidid></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1536-1233
ispartof IEEE transactions on mobile computing, 2024-12, Vol.23 (12), p.13076-13094
issn 1536-1233
1558-0660
language eng
recordid cdi_crossref_primary_10_1109_TMC_2024_3430103
source IEEE Electronic Library (IEL)
subjects Adaptation models
Artificial neural networks
Cloud computing
Collaboration
Computational modeling
DNN model partition
Edge intelligence
end-cloud collaborative inference
Real-time systems
resource allocation
Resource management
title Real-Time Adaptive Partition and Resource Allocation for Multi-User End-Cloud Inference Collaboration in Mobile Environment
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-10T07%3A23%3A29IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Real-Time%20Adaptive%20Partition%20and%20Resource%20Allocation%20for%20Multi-User%20End-Cloud%20Inference%20Collaboration%20in%20Mobile%20Environment&rft.jtitle=IEEE%20transactions%20on%20mobile%20computing&rft.au=Li,%20Yiran&rft.date=2024-12&rft.volume=23&rft.issue=12&rft.spage=13076&rft.epage=13094&rft.pages=13076-13094&rft.issn=1536-1233&rft.eissn=1558-0660&rft.coden=ITMCCJ&rft_id=info:doi/10.1109/TMC.2024.3430103&rft_dat=%3Ccrossref_RIE%3E10_1109_TMC_2024_3430103%3C/crossref_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=10601493&rfr_iscdi=true