Enhancing Spectrum Efficiency in 6G Satellite Networks: A GAIL-Powered Policy Learning via Asynchronous Federated Inverse Reinforcement Learning
In this paper, a novel generative adversarial imitation learning (GAIL)-powered policy learning approach is proposed for optimizing beamforming, spectrum allocation, and remote user equipment (RUE) association in NTNs. Traditional reinforcement learning (RL) methods for wireless network optimization...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In this paper, a novel generative adversarial imitation learning
(GAIL)-powered policy learning approach is proposed for optimizing beamforming,
spectrum allocation, and remote user equipment (RUE) association in NTNs.
Traditional reinforcement learning (RL) methods for wireless network
optimization often rely on manually designed reward functions, which can
require extensive parameter tuning. To overcome these limitations, we employ
inverse RL (IRL), specifically leveraging the GAIL framework, to automatically
learn reward functions without manual design. We augment this framework with an
asynchronous federated learning approach, enabling decentralized
multi-satellite systems to collaboratively derive optimal policies. The
proposed method aims to maximize spectrum efficiency (SE) while meeting minimum
information rate requirements for RUEs. To address the non-convex, NP-hard
nature of this problem, we combine the many-to-one matching theory with a
multi-agent asynchronous federated IRL (MA-AFIRL) framework. This allows agents
to learn through asynchronous environmental interactions, improving training
efficiency and scalability. The expert policy is generated using the Whale
optimization algorithm (WOA), providing data to train the automatic reward
function within GAIL. Simulation results show that the proposed MA-AFIRL method
outperforms traditional RL approaches, achieving a $14.6\%$ improvement in
convergence and reward value. The novel GAIL-driven policy learning establishes
a novel benchmark for 6G NTN optimization. |
---|---|
DOI: | 10.48550/arxiv.2409.18718 |