Maneuver Decision-Making for Autonomous Air Combat Based on FRE-PPO

Maneuver decision-making is the core of autonomous air combat, and reinforcement learning is a potential and ideal approach for addressing decision-making problems. However, when reinforcement learning is used for maneuver decision-making for autonomous air combat, it often suffers from awful traini...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Applied sciences 2022-10, Vol.12 (20), p.10230
Hauptverfasser:	Zhang, Hongpeng, Wei, Yujie, Zhou, Huan, Huang, Changqiang
Format:	Artikel
Sprache:	eng
Schlagworte:	Ablation Air combat Algorithms Artificial intelligence autonomous air combat Decision making Efficiency final reward estimation Games Kinematics Learning maneuver decision-making Maneuvers Methods Optimization proximal policy optimization Reinforcement reinforcement learning Sampling Simulation
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Maneuver decision-making is the core of autonomous air combat, and reinforcement learning is a potential and ideal approach for addressing decision-making problems. However, when reinforcement learning is used for maneuver decision-making for autonomous air combat, it often suffers from awful training efficiency and poor performance of maneuver decision-making. In this paper, an air combat maneuver decision-making method based on final reward estimation and proximal policy optimization is proposed to solve the above problems. First, an air combat environment based on aircraft and missile models is constructed, and an intermediate reward and final reward are designed. Second, the final reward estimation is proposed to replace the original advantage estimation function of the surrogate objective of proximal policy optimization to improve the training performance of reinforcement learning. Third, sampling according to the final reward estimation is proposed to improve the training efficiency. Finally, the proposed method is used in a self-play framework to train agents for maneuver decision-making. Simulations show that final reward estimation and sampling according to final reward estimation are effective and efficient.
ISSN:	2076-3417 2076-3417
DOI:	10.3390/app122010230