Real-Time Adaptive Partition and Resource Allocation for Multi-User End-Cloud Inference Collaboration in Mobile Environment

The deployment of Deep Neural Networks (DNNs) requires significant computational and storage resources, which is challenging for resource-constrained end devices. To this end, collaborative deep inference is proposed, in which the DNN is divided into two parts and executed on the end device and clou...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on mobile computing 2024-12, Vol.23 (12), p.13076-13094
Hauptverfasser:	Li, Yiran, Liu, Zhen, Kou, Ze, Wang, Yannan, Zhang, Guoqiang, Li, Yidong, Sun, Yongqi
Format:	Magazinearticle
Sprache:	eng
Schlagworte:	Adaptation models Artificial neural networks Cloud computing Collaboration Computational modeling DNN model partition Edge intelligence end-cloud collaborative inference Real-time systems resource allocation Resource management
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The deployment of Deep Neural Networks (DNNs) requires significant computational and storage resources, which is challenging for resource-constrained end devices. To this end, collaborative deep inference is proposed, in which the DNN is divided into two parts and executed on the end device and cloud respectively. The selection of DNN partition point is the key challenge to realize end-cloud collaborative deep inference, especially in mobile environments with unstable networks. In this paper, we propose a Real-time Adaptive Partition (RAP) framework, in which a fast split point decision algorithm is proposed to realize real-time adaptive DNN model partition in the mobile network. A weighted joint optimization of DNN quantization loss, inference and transmission latency is performed. We further propose a Joint Multi-user Model Partition and Resource Allocation (JM-MPRA) algorithm under RAP framework. JM-MPRA aims to guarantee the optimized latency, accuracy and resource utilization in the multi-user scene. Experimental evaluations have demonstrated the effectiveness of RAP with JM-MPRA in improving the performance of real-time end-cloud collaborative inference in both stable and unstable mobile networks. Compared with the state-of-the-art methods, the proposed approaches can achieve up to 5.06x decrease in inference latency and bring performance improvement of 1.52% in inference accuracy.
ISSN:	1536-1233 1558-0660
DOI:	10.1109/TMC.2024.3430103