Real-Time Adaptive Partition and Resource Allocation for Multi-User End-Cloud Inference Collaboration in Mobile Environment
The deployment of Deep Neural Networks (DNNs) requires significant computational and storage resources, which is challenging for resource-constrained end devices. To this end, collaborative deep inference is proposed, in which the DNN is divided into two parts and executed on the end device and clou...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on mobile computing 2024-12, Vol.23 (12), p.13076-13094 |
---|---|
Hauptverfasser: | , , , , , , |
Format: | Magazinearticle |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 13094 |
---|---|
container_issue | 12 |
container_start_page | 13076 |
container_title | IEEE transactions on mobile computing |
container_volume | 23 |
creator | Li, Yiran Liu, Zhen Kou, Ze Wang, Yannan Zhang, Guoqiang Li, Yidong Sun, Yongqi |
description | The deployment of Deep Neural Networks (DNNs) requires significant computational and storage resources, which is challenging for resource-constrained end devices. To this end, collaborative deep inference is proposed, in which the DNN is divided into two parts and executed on the end device and cloud respectively. The selection of DNN partition point is the key challenge to realize end-cloud collaborative deep inference, especially in mobile environments with unstable networks. In this paper, we propose a Real-time Adaptive Partition (RAP) framework, in which a fast split point decision algorithm is proposed to realize real-time adaptive DNN model partition in the mobile network. A weighted joint optimization of DNN quantization loss, inference and transmission latency is performed. We further propose a Joint Multi-user Model Partition and Resource Allocation (JM-MPRA) algorithm under RAP framework. JM-MPRA aims to guarantee the optimized latency, accuracy and resource utilization in the multi-user scene. Experimental evaluations have demonstrated the effectiveness of RAP with JM-MPRA in improving the performance of real-time end-cloud collaborative inference in both stable and unstable mobile networks. Compared with the state-of-the-art methods, the proposed approaches can achieve up to 5.06x decrease in inference latency and bring performance improvement of 1.52% in inference accuracy. |
doi_str_mv | 10.1109/TMC.2024.3430103 |
format | Magazinearticle |
fullrecord | <record><control><sourceid>crossref_RIE</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TMC_2024_3430103</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10601493</ieee_id><sourcerecordid>10_1109_TMC_2024_3430103</sourcerecordid><originalsourceid>FETCH-LOGICAL-c147t-c74ecc59b74262fa832ed2d0f91959af4b7d9ab43c8d5d1263a4e208770788843</originalsourceid><addsrcrecordid>eNpNkE1LAzEQhoMoWKt3Dx7yB1LztZvkWJaqhRaltOclm8xCJN2U7LYg_nm31oOnGeZ93jk8CD0yOmOMmuftuppxyuVMSEEZFVdowopCE1qW9Pq8i5IwLsQtuuv7T0qZNkZN0PcGbCTbsAc89_YwhBPgD5uHMITUYdt5vIE-HbMb8xiTs7_3NmW8PsYhkF0PGS86T6qYjh4vuxYydCNdpRhtk_KlEDq8Tk2IMLKnkFO3h264RzetjT08_M0p2r0sttUbWb2_Lqv5ijgm1UCckuBcYRoleclbqwUHzz1tDTOFsa1slDe2kcJpX3jGS2ElcKqVokprLcUU0ctfl1PfZ2jrQw57m79qRuuzvHqUV5_l1X_yxsrTpRIA4B9eUiaNED8h9mxx</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>magazinearticle</recordtype></control><display><type>magazinearticle</type><title>Real-Time Adaptive Partition and Resource Allocation for Multi-User End-Cloud Inference Collaboration in Mobile Environment</title><source>IEEE Electronic Library (IEL)</source><creator>Li, Yiran ; Liu, Zhen ; Kou, Ze ; Wang, Yannan ; Zhang, Guoqiang ; Li, Yidong ; Sun, Yongqi</creator><creatorcontrib>Li, Yiran ; Liu, Zhen ; Kou, Ze ; Wang, Yannan ; Zhang, Guoqiang ; Li, Yidong ; Sun, Yongqi</creatorcontrib><description>The deployment of Deep Neural Networks (DNNs) requires significant computational and storage resources, which is challenging for resource-constrained end devices. To this end, collaborative deep inference is proposed, in which the DNN is divided into two parts and executed on the end device and cloud respectively. The selection of DNN partition point is the key challenge to realize end-cloud collaborative deep inference, especially in mobile environments with unstable networks. In this paper, we propose a Real-time Adaptive Partition (RAP) framework, in which a fast split point decision algorithm is proposed to realize real-time adaptive DNN model partition in the mobile network. A weighted joint optimization of DNN quantization loss, inference and transmission latency is performed. We further propose a Joint Multi-user Model Partition and Resource Allocation (JM-MPRA) algorithm under RAP framework. JM-MPRA aims to guarantee the optimized latency, accuracy and resource utilization in the multi-user scene. Experimental evaluations have demonstrated the effectiveness of RAP with JM-MPRA in improving the performance of real-time end-cloud collaborative inference in both stable and unstable mobile networks. Compared with the state-of-the-art methods, the proposed approaches can achieve up to 5.06x decrease in inference latency and bring performance improvement of 1.52% in inference accuracy.</description><identifier>ISSN: 1536-1233</identifier><identifier>EISSN: 1558-0660</identifier><identifier>DOI: 10.1109/TMC.2024.3430103</identifier><identifier>CODEN: ITMCCJ</identifier><language>eng</language><publisher>IEEE</publisher><subject>Adaptation models ; Artificial neural networks ; Cloud computing ; Collaboration ; Computational modeling ; DNN model partition ; Edge intelligence ; end-cloud collaborative inference ; Real-time systems ; resource allocation ; Resource management</subject><ispartof>IEEE transactions on mobile computing, 2024-12, Vol.23 (12), p.13076-13094</ispartof><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><orcidid>0009-0006-5356-7360 ; 0000-0002-8696-6898 ; 0009-0004-0964-0378 ; 0000-0003-2965-6196 ; 0000-0001-9452-606X ; 0000-0002-3556-8240</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10601493$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>776,780,792,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10601493$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Li, Yiran</creatorcontrib><creatorcontrib>Liu, Zhen</creatorcontrib><creatorcontrib>Kou, Ze</creatorcontrib><creatorcontrib>Wang, Yannan</creatorcontrib><creatorcontrib>Zhang, Guoqiang</creatorcontrib><creatorcontrib>Li, Yidong</creatorcontrib><creatorcontrib>Sun, Yongqi</creatorcontrib><title>Real-Time Adaptive Partition and Resource Allocation for Multi-User End-Cloud Inference Collaboration in Mobile Environment</title><title>IEEE transactions on mobile computing</title><addtitle>TMC</addtitle><description>The deployment of Deep Neural Networks (DNNs) requires significant computational and storage resources, which is challenging for resource-constrained end devices. To this end, collaborative deep inference is proposed, in which the DNN is divided into two parts and executed on the end device and cloud respectively. The selection of DNN partition point is the key challenge to realize end-cloud collaborative deep inference, especially in mobile environments with unstable networks. In this paper, we propose a Real-time Adaptive Partition (RAP) framework, in which a fast split point decision algorithm is proposed to realize real-time adaptive DNN model partition in the mobile network. A weighted joint optimization of DNN quantization loss, inference and transmission latency is performed. We further propose a Joint Multi-user Model Partition and Resource Allocation (JM-MPRA) algorithm under RAP framework. JM-MPRA aims to guarantee the optimized latency, accuracy and resource utilization in the multi-user scene. Experimental evaluations have demonstrated the effectiveness of RAP with JM-MPRA in improving the performance of real-time end-cloud collaborative inference in both stable and unstable mobile networks. Compared with the state-of-the-art methods, the proposed approaches can achieve up to 5.06x decrease in inference latency and bring performance improvement of 1.52% in inference accuracy.</description><subject>Adaptation models</subject><subject>Artificial neural networks</subject><subject>Cloud computing</subject><subject>Collaboration</subject><subject>Computational modeling</subject><subject>DNN model partition</subject><subject>Edge intelligence</subject><subject>end-cloud collaborative inference</subject><subject>Real-time systems</subject><subject>resource allocation</subject><subject>Resource management</subject><issn>1536-1233</issn><issn>1558-0660</issn><fulltext>true</fulltext><rsrctype>magazinearticle</rsrctype><creationdate>2024</creationdate><recordtype>magazinearticle</recordtype><sourceid>RIE</sourceid><recordid>eNpNkE1LAzEQhoMoWKt3Dx7yB1LztZvkWJaqhRaltOclm8xCJN2U7LYg_nm31oOnGeZ93jk8CD0yOmOMmuftuppxyuVMSEEZFVdowopCE1qW9Pq8i5IwLsQtuuv7T0qZNkZN0PcGbCTbsAc89_YwhBPgD5uHMITUYdt5vIE-HbMb8xiTs7_3NmW8PsYhkF0PGS86T6qYjh4vuxYydCNdpRhtk_KlEDq8Tk2IMLKnkFO3h264RzetjT08_M0p2r0sttUbWb2_Lqv5ijgm1UCckuBcYRoleclbqwUHzz1tDTOFsa1slDe2kcJpX3jGS2ElcKqVokprLcUU0ctfl1PfZ2jrQw57m79qRuuzvHqUV5_l1X_yxsrTpRIA4B9eUiaNED8h9mxx</recordid><startdate>202412</startdate><enddate>202412</enddate><creator>Li, Yiran</creator><creator>Liu, Zhen</creator><creator>Kou, Ze</creator><creator>Wang, Yannan</creator><creator>Zhang, Guoqiang</creator><creator>Li, Yidong</creator><creator>Sun, Yongqi</creator><general>IEEE</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0009-0006-5356-7360</orcidid><orcidid>https://orcid.org/0000-0002-8696-6898</orcidid><orcidid>https://orcid.org/0009-0004-0964-0378</orcidid><orcidid>https://orcid.org/0000-0003-2965-6196</orcidid><orcidid>https://orcid.org/0000-0001-9452-606X</orcidid><orcidid>https://orcid.org/0000-0002-3556-8240</orcidid></search><sort><creationdate>202412</creationdate><title>Real-Time Adaptive Partition and Resource Allocation for Multi-User End-Cloud Inference Collaboration in Mobile Environment</title><author>Li, Yiran ; Liu, Zhen ; Kou, Ze ; Wang, Yannan ; Zhang, Guoqiang ; Li, Yidong ; Sun, Yongqi</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c147t-c74ecc59b74262fa832ed2d0f91959af4b7d9ab43c8d5d1263a4e208770788843</frbrgroupid><rsrctype>magazinearticle</rsrctype><prefilter>magazinearticle</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Adaptation models</topic><topic>Artificial neural networks</topic><topic>Cloud computing</topic><topic>Collaboration</topic><topic>Computational modeling</topic><topic>DNN model partition</topic><topic>Edge intelligence</topic><topic>end-cloud collaborative inference</topic><topic>Real-time systems</topic><topic>resource allocation</topic><topic>Resource management</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Li, Yiran</creatorcontrib><creatorcontrib>Liu, Zhen</creatorcontrib><creatorcontrib>Kou, Ze</creatorcontrib><creatorcontrib>Wang, Yannan</creatorcontrib><creatorcontrib>Zhang, Guoqiang</creatorcontrib><creatorcontrib>Li, Yidong</creatorcontrib><creatorcontrib>Sun, Yongqi</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><jtitle>IEEE transactions on mobile computing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Li, Yiran</au><au>Liu, Zhen</au><au>Kou, Ze</au><au>Wang, Yannan</au><au>Zhang, Guoqiang</au><au>Li, Yidong</au><au>Sun, Yongqi</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Real-Time Adaptive Partition and Resource Allocation for Multi-User End-Cloud Inference Collaboration in Mobile Environment</atitle><jtitle>IEEE transactions on mobile computing</jtitle><stitle>TMC</stitle><date>2024-12</date><risdate>2024</risdate><volume>23</volume><issue>12</issue><spage>13076</spage><epage>13094</epage><pages>13076-13094</pages><issn>1536-1233</issn><eissn>1558-0660</eissn><coden>ITMCCJ</coden><abstract>The deployment of Deep Neural Networks (DNNs) requires significant computational and storage resources, which is challenging for resource-constrained end devices. To this end, collaborative deep inference is proposed, in which the DNN is divided into two parts and executed on the end device and cloud respectively. The selection of DNN partition point is the key challenge to realize end-cloud collaborative deep inference, especially in mobile environments with unstable networks. In this paper, we propose a Real-time Adaptive Partition (RAP) framework, in which a fast split point decision algorithm is proposed to realize real-time adaptive DNN model partition in the mobile network. A weighted joint optimization of DNN quantization loss, inference and transmission latency is performed. We further propose a Joint Multi-user Model Partition and Resource Allocation (JM-MPRA) algorithm under RAP framework. JM-MPRA aims to guarantee the optimized latency, accuracy and resource utilization in the multi-user scene. Experimental evaluations have demonstrated the effectiveness of RAP with JM-MPRA in improving the performance of real-time end-cloud collaborative inference in both stable and unstable mobile networks. Compared with the state-of-the-art methods, the proposed approaches can achieve up to 5.06x decrease in inference latency and bring performance improvement of 1.52% in inference accuracy.</abstract><pub>IEEE</pub><doi>10.1109/TMC.2024.3430103</doi><tpages>19</tpages><orcidid>https://orcid.org/0009-0006-5356-7360</orcidid><orcidid>https://orcid.org/0000-0002-8696-6898</orcidid><orcidid>https://orcid.org/0009-0004-0964-0378</orcidid><orcidid>https://orcid.org/0000-0003-2965-6196</orcidid><orcidid>https://orcid.org/0000-0001-9452-606X</orcidid><orcidid>https://orcid.org/0000-0002-3556-8240</orcidid></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 1536-1233 |
ispartof | IEEE transactions on mobile computing, 2024-12, Vol.23 (12), p.13076-13094 |
issn | 1536-1233 1558-0660 |
language | eng |
recordid | cdi_crossref_primary_10_1109_TMC_2024_3430103 |
source | IEEE Electronic Library (IEL) |
subjects | Adaptation models Artificial neural networks Cloud computing Collaboration Computational modeling DNN model partition Edge intelligence end-cloud collaborative inference Real-time systems resource allocation Resource management |
title | Real-Time Adaptive Partition and Resource Allocation for Multi-User End-Cloud Inference Collaboration in Mobile Environment |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-10T07%3A23%3A29IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Real-Time%20Adaptive%20Partition%20and%20Resource%20Allocation%20for%20Multi-User%20End-Cloud%20Inference%20Collaboration%20in%20Mobile%20Environment&rft.jtitle=IEEE%20transactions%20on%20mobile%20computing&rft.au=Li,%20Yiran&rft.date=2024-12&rft.volume=23&rft.issue=12&rft.spage=13076&rft.epage=13094&rft.pages=13076-13094&rft.issn=1536-1233&rft.eissn=1558-0660&rft.coden=ITMCCJ&rft_id=info:doi/10.1109/TMC.2024.3430103&rft_dat=%3Ccrossref_RIE%3E10_1109_TMC_2024_3430103%3C/crossref_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=10601493&rfr_iscdi=true |