Centralized and Distributed Deep Reinforcement Learning Methods for Downlink Sum-Rate Optimization

For a multi-cell, multi-user, cellular network downlink sum-rate maximization through power allocation is a nonconvex and NP-hard optimization problem. In this article, we present an effective approach to solving this problem through single- and multi-agent actor-critic deep reinforcement learning (...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on wireless communications 2020-12, Vol.19 (12), p.8410-8426
Hauptverfasser:	Khan, Ahmad Ali, Adve, Raviraj S.
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Cellular communication Deep learning Deep reinforcement learning Downlink Downlinking Information exchange Machine learning Mathematical programming Multiagent systems Optimization Resource management sum-rate maximization Supervised learning Wireless communication
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	For a multi-cell, multi-user, cellular network downlink sum-rate maximization through power allocation is a nonconvex and NP-hard optimization problem. In this article, we present an effective approach to solving this problem through single- and multi-agent actor-critic deep reinforcement learning (DRL). Specifically, we use finite-horizon trust region optimization. Through extensive simulations, we show that we can simultaneously achieve higher spectral efficiency than state-of-the-art optimization algorithms like weighted minimum mean-squared error (WMMSE) and fractional programming (FP), while offering execution times more than two orders of magnitude faster than these approaches. Additionally, the proposed trust region methods demonstrate superior performance and convergence properties than the Advantage Actor-Critic (A2C) DRL algorithm. In contrast to prior approaches, the proposed decentralized DRL approaches allow for distributed optimization with limited CSI and controllable information exchange between BSs while offering competitive performance and reduced training times.
ISSN:	1536-1276 1558-2248
DOI:	10.1109/TWC.2020.3022705