Reinforcement Learning-Based Joint User Scheduling and Link Configuration in Millimeter-Wave Networks

In this paper, we develop algorithms for joint user scheduling and three types of mmWave link configuration: relay selection, codebook optimization, and beam tracking in millimeter wave (mmWave) networks. Our goal is to design an online controller that dynamically schedules users and configures thei...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on wireless communications 2023-05, Vol.22 (5), p.3038-3054
Hauptverfasser:	Zhang, Yi, Heath, Robert W.
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms beam tracking codebook selection Complexity Configurations Control systems design Decision making Deep learning deep reinforcement learning Delay Delays Dynamic scheduling Machine learning Millimeter wave Millimeter wave communication Millimeter waves mobility multi-armed bandit Multi-armed bandit problems Neural networks Optimization proximal policy optimization relay selection Relays Scheduling Thompson sampling Training user scheduling Wireless communication
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In this paper, we develop algorithms for joint user scheduling and three types of mmWave link configuration: relay selection, codebook optimization, and beam tracking in millimeter wave (mmWave) networks. Our goal is to design an online controller that dynamically schedules users and configures their links to minimize system delay. To solve this complex scheduling problem, we model it as a dynamic decision-making process and develop two reinforcement learning-based solutions. The first solution is based on deep reinforcement learning (DRL), which leverages the proximal policy optimization to train a neural network-based solution. Due to the potential high sample complexity of DRL, we also propose an empirical multi-armed bandit (MAB)-based solution, which decomposes the decision-making process into a sequential of sub-actions and exploits classic maxweight scheduling and Thompson sampling to decide those sub-actions. Our evaluation of the proposed solutions confirms their effectiveness in providing acceptable system delay. It also shows that the DRL-based solution has better delay performance while the MAB-based solution has a faster training process.
ISSN:	1536-1276 1558-2248
DOI:	10.1109/TWC.2022.3215922