Learning RL-Policies for Joint Beamforming Without Exploration: A Batch Constrained Off-Policy Approach
In this work, we consider the problem of network parameter optimization for rate maximization. We frame this as a joint optimization problem of power control, beam forming, and interference cancellation. We consider the setting where multiple Base Stations (BSs) communicate with multiple user equipm...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In this work, we consider the problem of network parameter optimization for
rate maximization. We frame this as a joint optimization problem of power
control, beam forming, and interference cancellation. We consider the setting
where multiple Base Stations (BSs) communicate with multiple user equipment
(UEs). Because of the exponential computational complexity of brute force
search, we instead solve this nonconvex optimization problem using deep
reinforcement learning (RL) techniques. Modern communication systems are
notorious for their difficulty in exactly modeling their behavior. This limits
us in using RL-based algorithms as interaction with the environment is needed
for the agent to explore and learn efficiently. Further, it is ill-advised to
deploy the algorithm in the real world for exploration and learning because of
the high cost of failure. In contrast to the previous RL-based solutions
proposed, such as deep-Q network (DQN) based control, we suggest an offline
model-based approach. We specifically consider discrete batch-constrained deep
Q-learning (BCQ) and show that performance similar to DQN can be achieved with
only a fraction of the data without exploring. This maximizes sample efficiency
and minimizes risk in deploying a new algorithm to commercial networks. We
provide the entire project resource, including code and data, at the following
link: https://github.com/Heasung-Kim/ safe-rl-deployment-for-5g. |
---|---|
DOI: | 10.48550/arxiv.2310.08660 |