Subtask-masked curriculum learning for reinforcement learning with application to UAV maneuver decision-making

Unmanned Aerial Vehicle (UAV) maneuver strategy learning remains a challenge when using Reinforcement Learning (RL) in this sparse reward task. In this paper, we propose Subtask-Masked curriculum learning for RL (SubMas-RL), an efficient RL paradigm that implements curriculum learning and knowledge...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Engineering applications of artificial intelligence 2023-10, Vol.125, p.106703, Article 106703
Hauptverfasser:	Hou, Yueqi, Liang, Xiaolong, Lv, Maolong, Yang, Qisong, Li, Yang
Format:	Artikel
Sprache:	eng
Schlagworte:	Curriculum learning Knowledge transfer Maneuver decision-making Reinforcement learning Unmanned Aerial Vehicle
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Unmanned Aerial Vehicle (UAV) maneuver strategy learning remains a challenge when using Reinforcement Learning (RL) in this sparse reward task. In this paper, we propose Subtask-Masked curriculum learning for RL (SubMas-RL), an efficient RL paradigm that implements curriculum learning and knowledge transfer for UAV maneuver scenarios involving multiple missiles. First, this study introduces a novel concept known as subtask mask to create source tasks from a target task by masking partial subtasks. Then, a subtask-masked curriculum generation method is proposed to generate a sequenced curriculum by alternately conducting task generation and task sequencing. To establish efficient knowledge transfer and avoid negative transfer, this paper employs two transfer techniques, policy distillation and policy reuse, along with an explicit transfer condition that masks irrelevant knowledge. Experimental results demonstrate that our method achieves a 94.8% success rate in the UAV maneuver scenario, where the direct use of reinforcement learning always fails. The proposed RL framework SubMas-RL is expected to learn an effective policy in complex tasks with sparse rewards.
ISSN:	0952-1976
DOI:	10.1016/j.engappai.2023.106703