Multi-agent deep reinforcement learning (MADRL) meets multi-user MIMO systems
A multi-agent deep reinforcement learning (MADRL) is a promising approach to challenging problems in wireless environments involving multiple decision-makers (or actors) with high-dimensional continuous action space. In this paper, we present a MADRL-based approach that can jointly optimize precoder...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | A multi-agent deep reinforcement learning (MADRL) is a promising approach to
challenging problems in wireless environments involving multiple
decision-makers (or actors) with high-dimensional continuous action space. In
this paper, we present a MADRL-based approach that can jointly optimize
precoders to achieve the outer-boundary, called pareto-boundary, of the
achievable rate region for a multiple-input single-output (MISO) interference
channel (IFC). In order to address two main challenges, namely, multiple actors
(or agents) with partial observability and multi-dimensional continuous action
space in MISO IFC setup, we adopt a multi-agent deep deterministic policy
gradient (MA-DDPG) framework in which decentralized actors with partial
observability can learn a multi-dimensional continuous policy in a centralized
manner with the aid of shared critic with global information. Meanwhile, we
will also address a phase ambiguity issue with the conventional complex
baseband representation of signals widely used in radio communications. In
order to mitigate the impact of phase ambiguity on training performance, we
propose a training method, called phase ambiguity elimination (PAE), that leads
to faster learning and better performance of MA-DDPG in wireless communication
systems. The simulation results exhibit that MA-DDPG is capable of learning a
near-optimal precoding strategy in a MISO IFC environment. To the best of our
knowledge, this is the first work to demonstrate that the MA-DDPG framework can
jointly optimize precoders to achieve the pareto-boundary of achievable rate
region in a multi-cell multi-user multi-antenna system. |
---|---|
DOI: | 10.48550/arxiv.2109.04986 |