Deep Recurrent Reinforcement Learning for Intercept Guidance Law under Partial Observability

Nowadays, the rapid development of hypersonic vehicles brings great challenges to the missile defense system. As achieving successful interception depends highly on terminal guidance laws, research on guidance laws for intercepting highly maneuvering targets has aroused increasing attention. Artific...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Applied artificial intelligence 2024-12, Vol.38 (1)
Hauptverfasser:	Wang, Xu, Deng, Yifan, Cai, Yuanli, Jiang, Haonan
Format:	Artikel
Sprache:	eng
Schlagworte:	Artificial intelligence Convergence Deep learning Energy consumption Hypersonic vehicles Interception Line of sight Maneuvering targets Markov processes Missile control Missile defense R&D Research & development Terminal guidance
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Nowadays, the rapid development of hypersonic vehicles brings great challenges to the missile defense system. As achieving successful interception depends highly on terminal guidance laws, research on guidance laws for intercepting highly maneuvering targets has aroused increasing attention. Artificial intelligence technologies, such as deep reinforcement learning (DRL), have been widely applied to improve the performance of guidance laws. However, the existing DRL guidance laws rarely consider the partial observability problem of onboard sensors, resulting in the limitations of their engineering applications. In this paper, a deep recurrent reinforcement learning (DRRL)-based guidance method is investigated to address the intercept guidance problem against maneuvering targets under partial observability. The sequence consisting of previous state observations is utilized as the input of the policy network. A recurrent layer is introduced into the networks to extract hidden information behind the temporal sequence to support policy training. The guidance problem is formulated as a partially observable Markov decision process model, and then a range-weighted reward function that considers the line-of-sight rate and energy consumption is designed to guarantee convergence of policy training. The effectiveness of the proposed DRRL guidance law is validated by extensive numerical simulations.
ISSN:	0883-9514 1087-6545
DOI:	10.1080/08839514.2024.2355023