Deep-Deterministic Policy Gradient based Multi-Resource Allocation in Edge-Cloud System: A distributed approach

Edge Cloud (EC) empowers the beyond 5G (B5G) wireless networks to cope with large-scale and real-time traffics of Internet-of-Things (IoT) by minimizing the latency and providing compute power at the edge of the network. Due to a limited amount of resources at the EC compared to the back-end cloud (...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE access 2023-01, Vol.11, p.1-1
Hauptverfasser: Qadeer, Arslan, Lee, Myung J.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Edge Cloud (EC) empowers the beyond 5G (B5G) wireless networks to cope with large-scale and real-time traffics of Internet-of-Things (IoT) by minimizing the latency and providing compute power at the edge of the network. Due to a limited amount of resources at the EC compared to the back-end cloud (BC), intelligent resource management techniques become imperative. This paper studies the problem of multi-resource allocation (MRA) in terms of compute and wireless resources in an integrated EC and BC environment. Machine learning-based approaches are emerging to solve such optimization problems. However, it is challenging to adopt traditional discrete action space-based methods due to their high dimensionality issue. To this end, we propose a deep-deterministic policy gradient (DDPG) based temporal feature learning attentional network (TFLAN) model to address the MRA problem. TFLAN combines convolution, gated recurrent unit and attention layers together to mine local and long term temporal information from the task sequences for excellent function approximation. A novel heuristic-based priority experience replay (hPER) method is formulated to accelerate the convergence speed. Further, a pruning principle helps the TFLAN agent to significantly reduce the computational complexity and balance the load among base stations and servers to minimize the rejection-rate. Lastly, data parallelism technique is adopted for distributed training to meet the needs of a high-volume of IoT traffic in the EC environment. Experimental results demonstrate that the distributed training approach suites well to the problem scale and can magnify the speed of the learning process. We validate the proposed framework by comparing with five state-of-the-art RL agents. Our proposed agent converges fast and achieves up to 28% and 72% reduction in operational cost and rejection-rate, and achieves up to 32% gain in the quality of experience on average, compared to the most advanced DDPG agent.
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2023.3249153