Multi-UAV air combat cooperative game based on virtual opponent and value attention decomposition policy gradient

In the multi-unmanned aerial vehicle (UAV) air combat confrontation environment, deriving the cooperative policy of friendly aircraft is still a challenge, owing to the higher-order differential dynamics model of aircraft and the confidence assignment problem in multi-UAV air combat with conflict an...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Expert systems with applications 2025-04, Vol.267, p.126069, Article 126069
Hauptverfasser: Xu, Xiaojie, Wang, Yunfan, Guo, Xian, Huang, Kuihua, Zhang, Xuebo
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In the multi-unmanned aerial vehicle (UAV) air combat confrontation environment, deriving the cooperative policy of friendly aircraft is still a challenge, owing to the higher-order differential dynamics model of aircraft and the confidence assignment problem in multi-UAV air combat with conflict and cooperation. In this paper, a novel reinforcement learning method that combines virtual opponent and value attention decomposition is proposed. In particular, to reduce the difficulty in training induced by the higher order differential dynamics model, the actions of aircraft are abstracted into actions of the game layer and maneuvering actions of the bottom layer, in which the actions of the game layer are modeled as the pose of the virtual opponent. In the training process, only the policy of the game layer is trained, and the maneuvering policy of the bottom layer is the default policy or the rule-based policy. To address the confidence assignment problem encountered during multi-UAV cooperative training, the total value function of the team is decomposed into individual value functions based on the attention mechanism, and the policy of the game layer is optimized by integrating the individual value into the gradient computation as the baseline. Finally, the algorithm is verified on the dynamic high-fidelity training platform. The results indicate that the algorithm outperforms the state-of-the-art method in typical multi-UAV air combat scenarios such as 4V4, 5V5, and 6V6. •A novel virtual opponent based hierarchical policy is proposed.•A value attention decomposition policy gradient algorithm is developed.•Experiments are performed on the dynamic high-fidelity training platform.
ISSN:0957-4174
DOI:10.1016/j.eswa.2024.126069