Deep Multi-Task Conditional and Sequential Learning for Anti-Jamming
Multi-task learning provides plenty of room for performance improvement to single-task learning, when learned tasks are related and learned with mutual information. In this work, we analyze the efficiency of using a single-task reinforcement learning algorithm to mitigate jamming attacks with freque...
Gespeichert in:
Veröffentlicht in: | IEEE access 2021, Vol.9, p.123194-123207 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Multi-task learning provides plenty of room for performance improvement to single-task learning, when learned tasks are related and learned with mutual information. In this work, we analyze the efficiency of using a single-task reinforcement learning algorithm to mitigate jamming attacks with frequency hopping strategy. Our findings show that single-task learning implementations do not always guarantee optimal cumulative reward when some jammer's parameters are unknown, notably the jamming time-slot length in this case. Therefore, to maximize packet transmission in the presence of a jammer whose parameters are unknown, we propose deep multi-task conditional and sequential learning (DMCSL), a multi-task learning algorithm that builds a transition policy to optimize conditional and sequential tasks. For the anti-jamming system, the proposed model learns two tasks: sensing time and transmission channel selection. DMCSL is a composite of the state-of-the-art reinforcement learning algorithms, multi-armed bandit and an extended deep-Q-network. To improve the chance of convergence and optimal cumulative reward of the algorithm, we also propose a continuous action-space update algorithm for sensing time action-space. The simulation results show that DMCSL guarantees better performance than single-task learning by relying on a logarithmically increased action-space sample. Against a random dynamic jamming time-slot, DMCSL achieves about three times better cumulative reward, and against a periodic dynamic jamming time-slot, it improves by 10% the cumulative reward. |
---|---|
ISSN: | 2169-3536 2169-3536 |
DOI: | 10.1109/ACCESS.2021.3109856 |