Multi-Agent Reinforcement Learning for Dynamic Topology Optimization of Mesh Wireless Networks

In Mesh Wireless Networks (MWNs), the network coverage is extended by connecting Access Points (APs) in a mesh topology, where transmitting frames by multi-hop routing has to sustain the performances, such as end-to-end (E2E) delay and channel efficiency. Several recent studies have focused on minim...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on wireless communications 2024-09, Vol.23 (9), p.10501-10513
Hauptverfasser:	Sun, Wei, Lv, Qiushuo, Xiao, Yang, Liu, Zhi, Tang, Qingwei, Li, Qiyue, Mu, Daoming
Format:	Artikel
Sprache:	eng
Schlagworte:	Actor-critic ad hoc wireless network Algorithms Convergence Delay Delays Efficiency Logic gates Machine learning mesh wireless network Multiagent systems Network topologies Network topology reinforcement learning Topology Topology optimization Trajectory Vectors Wireless networks
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In Mesh Wireless Networks (MWNs), the network coverage is extended by connecting Access Points (APs) in a mesh topology, where transmitting frames by multi-hop routing has to sustain the performances, such as end-to-end (E2E) delay and channel efficiency. Several recent studies have focused on minimizing E2E delay, but these methods are unable to adapt to the dynamic nature of MWNs. Meanwhile, reinforcement-learning-based methods offer better adaptability to dynamics but suffer from the problem of high-dimensional action spaces, leading to slower convergence. In this paper, we propose a multi-agent actor-critic reinforcement learning (MACRL) algorithm to optimize multiple objectives, specifically the minimization of E2E delay and the enhancement of channel efficiency. First, to reduce the action space and speed up the convergence in the dynamical optimization process, a centralized-critic-distributed-actor scheme is proposed. Then, a multi-objective reward balancing method is designed to dynamically balance the MWNs' performances between the E2E delay and the channel efficiency. Finally, the trained MACRL algorithm is deployed in the QaulNet simulator to verify its effectiveness.
ISSN:	1536-1276 1558-2248
DOI:	10.1109/TWC.2024.3372694