Research on heterogeneous multi-UAV collaborative decision-making method based on improved PPO

In order to solve the problem that the Proximal Policy Optimization (PPO) algorithm is difficult to converge in the air-sea battle scenarios with high dynamics, strong interference, and complex state space, the Ray-LAPPO algorithm based on Long Short-Term Memory (LSTM) and Attention mechanism is pro...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Applied intelligence (Dordrecht, Netherlands) Netherlands), 2024-10, Vol.54 (20), p.9892-9905
Hauptverfasser:	Xu, Lin, Zhang, Xinmiao, Xiao, Dong, Liu, Beihong, Liu, Aixue
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Artificial Intelligence Collaboration Computer Science Decision making Distributed memory Machines Manufacturing Mechanical Engineering Multiagent systems Processes Reagents Unmanned aerial vehicles
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In order to solve the problem that the Proximal Policy Optimization (PPO) algorithm is difficult to converge in the air-sea battle scenarios with high dynamics, strong interference, and complex state space, the Ray-LAPPO algorithm based on Long Short-Term Memory (LSTM) and Attention mechanism is proposed in this paper under the distributed training framework Ray. Firstly, the idea of Centralized Training Distributed Execution (CTDE) is adopted to extend the PPO algorithm to the field of multi-agent and the policy entropy is added to the loss function to encourage the exploration of agents; Secondly, the LSTM network is added to the actor and critic networks to explore the timing relationship between non-independent and identically distributed samples and improve the learning performance of the UAV; In addition, the Attention mechanism is introduced to obtain the states at different time steps and establish a weighted differentiation model of the final value function; Finally, the simulation experiments on the self-developed heterogeneous UAV collaborative decision-making environment show that Ray-LAPPO can get the most advanced performance in different scenarios, and also possesses potential value for large-scale real-world applications.
ISSN:	0924-669X 1573-7497
DOI:	10.1007/s10489-024-05674-w