Deep Reinforcement Learning for User Association and Resource Allocation in Heterogeneous Cellular Networks

Heterogeneous cellular networks can offload the mobile traffic and reduce the deployment costs, which have been considered to be a promising technique in the next-generation wireless network. Due to the non-convex and combinatorial characteristics, it is challenging to obtain an optimal strategy for...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on wireless communications 2019-11, Vol.18 (11), p.5141-5152
Hauptverfasser: Zhao, Nan, Liang, Ying-Chang, Niyato, Dusit, Pei, Yiyang, Wu, Minghu, Jiang, Yunhao
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Heterogeneous cellular networks can offload the mobile traffic and reduce the deployment costs, which have been considered to be a promising technique in the next-generation wireless network. Due to the non-convex and combinatorial characteristics, it is challenging to obtain an optimal strategy for the joint user association and resource allocation issue. In this paper, a reinforcement learning (RL) approach is proposed to achieve the maximum long-term overall network utility while guaranteeing the quality of service requirements of user equipments (UEs) in the downlink of heterogeneous cellular networks. A distributed optimization method based on multi-agent RL is developed. Moreover, to solve the computationally expensive problem with the large action space, multi-agent deep RL method is proposed. Specifically, the state, action and reward function are defined for UEs, and dueling double deep Q-network (D3QN) strategy is introduced to obtain the nearly optimal policy. Through message passing, the distributed UEs can obtain the global state space with a small communication overhead. With the double-Q strategy and dueling architecture, D3QN can rapidly converge to a subgame perfect Nash equilibrium. Simulation results demonstrate that D3QN achieves the better performance than other RL approaches in solving large-scale learning problems.
ISSN:1536-1276
1558-2248
DOI:10.1109/TWC.2019.2933417