Automatic Pipeline Parallelism: A Parallel Inference Framework for Deep Learning Applications in 6G Mobile Communication Systems

With the rapid development of wireless communication, achieving the neXt generation Ultra-Reliable and Low-Latency Communications (xURLLC) in 6G mobile communication systems has become a critical problem. Among many applications in xURLLC, deep learning model inference requires improvement over its...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE journal on selected areas in communications 2023-07, Vol.41 (7), p.1-1
Hauptverfasser:	Shi, Hongjian, Zheng, Weichu, Liu, Zifei, Ma, Ruhui, Guan, Haibing
Format:	Artikel
Sprache:	eng
Schlagworte:	6G mobile communication Communication Computational modeling Data models Deep learning Distributed Learning Edge computing Efficiency Hardware Profiling Inference Machine learning Mobile communication systems Modules Network latency Parallel Inference Parallel processing Pipelines Pipelining (computers) Reliability Schedules System Heterogeneity Task analysis Task Scheduling Training Wireless communications
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	With the rapid development of wireless communication, achieving the neXt generation Ultra-Reliable and Low-Latency Communications (xURLLC) in 6G mobile communication systems has become a critical problem. Among many applications in xURLLC, deep learning model inference requires improvement over its efficiency. Due to the heterogeneous hardware environment in 6G, parallel schedules from distributed machine learning and edge computing has been borrowed to tackle the efficiency problem. However, traditional parallel schedules suffer from high latency, low throughput, and low device utility. In this paper, we propose Automatic Pipeline Parallelism ( AP 2 ), a parallel inference framework for deep learning applications in 6G mobile communication systems, to improve the model inference efficiency while maintaining reliability. AP 2 contains three sub-modules. A task-device affinity predictor predicts a task's expected execution time on a given device. The parallel inference arrangement optimizer finds the most suitable device for each task. The parallel inference scheduler converts the arrangement to a schedule that can be directly executed in the system. The experimental results show that AP 2 can achieve better latency, throughput, reliability, and device utility than other parallel schedules. Also, the priority of the sub-module designs has been approved through the experiments.
ISSN:	0733-8716 1558-0008
DOI:	10.1109/JSAC.2023.3280970