A sub-action aided deep reinforcement learning framework for latency-sensitive network slicing

Network slicing is a core technique of fifth-generation (5G) systems and beyond. To maximize the number of accepted network slices with limited hardware resources, service providers must avoid over-provisioning of quality-of-service (QoS), which could prevent them from lowering capital expenditures...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Computer networks (Amsterdam, Netherlands : 1999) Netherlands : 1999), 2022-11, Vol.217, p.109279, Article 109279
Hauptverfasser: Xiao, Da, Chen, Shuo, Ni, Wei, Zhang, Jie, Zhang, Andrew, Liu, Renping
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Network slicing is a core technique of fifth-generation (5G) systems and beyond. To maximize the number of accepted network slices with limited hardware resources, service providers must avoid over-provisioning of quality-of-service (QoS), which could prevent them from lowering capital expenditures (CAPEX)/operating expenses (OPEX) for 5G infrastructure. In this paper, we propose a sub-action aided double deep Q-network (SADDQN)-based network slicing algorithm for latency-aware services. Specifically, we model network slicing as a Markov decision process (MDP), where we consider virtual network function (VNF) placements to be the actions of the MDP, and define a reward function based on cost and service priority. Furthermore, we adopt the Dijkstra algorithm to determine the forwarding graph (FG) embedding for a given VNF placement and design a resource allocation algorithm – binary search assisted gradient descent (BSAGD) – to allocate resources to VNFs given the VNF-FG placement. For every service request, we first use the DDQN to choose an MDP action to determine the VNF placement (main action). Next, we employ the Dijkstra algorithm (first-phase sub-action) to find the shortest path for each pair of adjacent VNFs in the given VNF chain. Finally, we implement the BSAGD (second-phase sub-action) to realize this service with the minimum cost. The joint action results in an MDP reward that can be utilized to train the DDQN. Numerical evaluations show that, compared to state-of-the-art algorithms, the proposed algorithm can improve the cost-efficiency while giving priority to higher-priority services and maximizing the acceptance ratio.
ISSN:1389-1286
1872-7069
DOI:10.1016/j.comnet.2022.109279