Distributed dynamic spectrum access through multi-agent deep recurrent Q-learning in cognitive radio network

This paper addresses the problem of distributed dynamic spectrum access in a cognitive radio (CR) environment utilizing deep recurrent reinforcement learning. Specifically, the network consists of multiple primary users (PU) transmitting intermittently in their respective channels, while the seconda...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Physical communication 2023-06, Vol.58, p.102054, Article 102054
Hauptverfasser:	Giri, Manish Kumar, Majumder, Saikat
Format:	Artikel
Sprache:	eng
Schlagworte:	Deep recurrent Q-network Distributed spectrum access Long short-term memory Q-learning Reinforcement learning Resource allocation
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	This paper addresses the problem of distributed dynamic spectrum access in a cognitive radio (CR) environment utilizing deep recurrent reinforcement learning. Specifically, the network consists of multiple primary users (PU) transmitting intermittently in their respective channels, while the secondary users (SU) attempt to access the channels when PUs are not transmitting. The problem is challenging considering the decentralized nature of CR network where each SU attempts to access a vacant channel, without coordination with other SUs, which result in collision and throughput loss. To address this issue, a multi-agent environment is considered where each of the SUs perform independent reinforcement learning to learn the appropriate policy to transmit opportunistically so as to minimize collisions with other users. In this article, we propose two long short-term memory (LSTM) based deep recurrent Q-network (DRQN) architectures for exploiting the temporal correlation in the transmissions by various nodes in the network. Furthermore, we investigate the effect of the architecture on success rate with varying number of users in the network and partial channel observations. Simulation results are compared with other existing reinforcement learning based techniques to establish the superiority of the proposed method.
ISSN:	1874-4907 1876-3219
DOI:	10.1016/j.phycom.2023.102054