Energy-efficient access point clustering and power allocation in cell-free massive MIMO networks: a hierarchical deep reinforcement learning approach

Cell-free massive multiple-input multiple-output (CF-mMIMO) has attracted considerable attention due to its potential for delivering high data rates and energy efficiency (EE). In this paper, we investigate the resource allocation of downlink in CF-mMIMO systems. A hierarchical depth deterministic s...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:EURASIP journal on advances in signal processing 2024-12, Vol.2024 (1), p.18-25, Article 18
Hauptverfasser: Tan, Fangqing, Deng, Quanxuan, Liu, Qiang
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Cell-free massive multiple-input multiple-output (CF-mMIMO) has attracted considerable attention due to its potential for delivering high data rates and energy efficiency (EE). In this paper, we investigate the resource allocation of downlink in CF-mMIMO systems. A hierarchical depth deterministic strategy gradient (H-DDPG) framework is proposed to jointly optimize the access point (AP) clustering and power allocation. The framework uses two-layer control networks operating on different timescales to enhance EE of downlinks in CF-mMIMO systems by cooperatively optimizing AP clustering and power allocation. In this framework, the high-level processing of system-level problems, namely AP clustering, enhances the wireless network configuration by utilizing DDPG on the large timescale while meeting the minimum spectral efficiency (SE) constraints for each user. The low layer solves the link-level sub-problem, that is, power allocation, and reduces interference between APs and improves transmission performance by utilizing DDPG on a small timescale while meeting the maximum transmit power constraint of each AP. Two corresponding DDPG agents are trained separately, allowing them to learn from the environment and gradually improve their policies to maximize the system EE. Numerical results validate the effectiveness of the proposed algorithm in term of its convergence speed, SE, and EE.
ISSN:1687-6180
1687-6172
1687-6180
DOI:10.1186/s13634-024-01111-9