Online reinforcement learning for adaptive interference coordination

Heterogeneous networks (HetNets), in which small cells overlay macro cells, are a cost‐effective approach to increase the capacity of cellular networks. However, HetNets have raised new issues related to cell association and interference management. In particular, the optimal configuration of interf...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Transactions on emerging telecommunications technologies 2020-10, Vol.31 (10), p.n/a
Hauptverfasser: Alcaraz, Juan J., Ayala‐Romero, Jose A., Vales‐Alonso, Javier, Losilla‐López, Fernando
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Heterogeneous networks (HetNets), in which small cells overlay macro cells, are a cost‐effective approach to increase the capacity of cellular networks. However, HetNets have raised new issues related to cell association and interference management. In particular, the optimal configuration of interference coordination (IC) parameters is a challenging task because it depends on multiple stochastic processes such as the locations of the users, the traffic demands, or the strength of the received signals. This work proposes a self‐optimization algorithm capable of finding the optimal configuration in an operating network. We address the problem using a reinforcement learning (RL) approach, in which the actions are the IC configurations, whose performances are initially unknown. The main difficulty is that, due to the variable network conditions, the performance of each action may change over time. Our proposal is based on two main elements: the sequential exploration of subsets of actions (exploration regions), and an optimal stopping (OS) strategy for deciding when to end current exploration and start a new one. For our algorithm, referred to as local exploration with optimal stopping (LEOS), we provide theoretical bounds on its long‐term regret per sample and its convergence time. We compare LEOS to state‐of‐the‐art learning algorithms based on multiarmed bandits and policy gradient RL. Considering different changing rates in the network conditions, our numerical results show that LEOS outperforms the first alternative by 22%, and the second one by 48% in terms of average regret per sample. We propose an online reinforcement learning algorithm for adjusting the interference coordination parameters in an operating heterogeneous network. Our proposal combines elements from multi‐armed bandits, sequential hypothesis testing, and stochastic approximation.
ISSN:2161-3915
2161-3915
DOI:10.1002/ett.4087