Fast Adaptive Jamming Resource Allocation Against Frequency-Hopping Spread Spectrum in Wireless Sensor Networks via Meta-Deep-Reinforcement-Learning
Partial-band noise jamming is an important countermeasure against frequency-hopping spread spectrum technology in wireless sensor networks, and the jamming resource allocation (JRA) problem involved is a high-dimensional combinatorial optimization and also an NP-hard problem. Moreover, the users can...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on aerospace and electronic systems 2024-12, Vol.60 (6), p.7676-7693 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Partial-band noise jamming is an important countermeasure against frequency-hopping spread spectrum technology in wireless sensor networks, and the jamming resource allocation (JRA) problem involved is a high-dimensional combinatorial optimization and also an NP-hard problem. Moreover, the users can dynamically alter their communication status, such as changing the channel spectrum distributions of hopping sets, posing additional difficulties for JRA. This article develops two methods to address the aforementioned challenges. First, a deep reinforcement learning (DRL)-based method is proposed for efficient JRA optimization, with the jamming scheme of each jamming node decided sequentially by the policy neural network, and its parameters are updated through trust region policy optimization (TRPO) within a trust region to ensure stable and fast convergence. Second, we propose a meta-TRPO-based method to improve the generalization capability of the policy network. After the meta-training process, it can update the meta-policy network with just a few fine-tuning steps and quickly obtain a task-specific policy for the fresh task. Extensive simulation results show that the proposed DRL-based method converges faster than other DRL methods. In addition, the proposed meta-TRPO-based method can rapidly adapt to unseen jamming tasks with only a small quantity of training trajectories. |
---|---|
ISSN: | 0018-9251 1557-9603 |
DOI: | 10.1109/TAES.2024.3418944 |