Research on heterogeneous multi-UAV collaborative decision-making method based on improved PPO
In order to solve the problem that the Proximal Policy Optimization (PPO) algorithm is difficult to converge in the air-sea battle scenarios with high dynamics, strong interference, and complex state space, the Ray-LAPPO algorithm based on Long Short-Term Memory (LSTM) and Attention mechanism is pro...
Gespeichert in:
Veröffentlicht in: | Applied intelligence (Dordrecht, Netherlands) Netherlands), 2024-10, Vol.54 (20), p.9892-9905 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In order to solve the problem that the Proximal Policy Optimization (PPO) algorithm is difficult to converge in the air-sea battle scenarios with high dynamics, strong interference, and complex state space, the Ray-LAPPO algorithm based on Long Short-Term Memory (LSTM) and Attention mechanism is proposed in this paper under the distributed training framework Ray. Firstly, the idea of Centralized Training Distributed Execution (CTDE) is adopted to extend the PPO algorithm to the field of multi-agent and the policy entropy is added to the loss function to encourage the exploration of agents; Secondly, the LSTM network is added to the actor and critic networks to explore the timing relationship between non-independent and identically distributed samples and improve the learning performance of the UAV; In addition, the Attention mechanism is introduced to obtain the states at different time steps and establish a weighted differentiation model of the final value function; Finally, the simulation experiments on the self-developed heterogeneous UAV collaborative decision-making environment show that Ray-LAPPO can get the most advanced performance in different scenarios, and also possesses potential value for large-scale real-world applications. |
---|---|
ISSN: | 0924-669X 1573-7497 |
DOI: | 10.1007/s10489-024-05674-w |