Enhancing Proximal Policy Optimization for UAV Air Combat with Exploration Boosting and Covariance Matrix Adaptation Strategy

In the field of unmanned aerial vehicle (UAV) air combat, optimization algorithms must perform exceptionally well to address the complexities of tactical environments and multidimensional control challenges. Thus, developing strategies to enhance exploration, improve stability, and capture the compl...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE access 2025-01, p.1-1
Hauptverfasser: Zhou, Zhuangfeng, Jiang, Junzhe, Wang, Hongming, Wu, Xiang, Deng, Wenqin, Chen, Xueyun
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In the field of unmanned aerial vehicle (UAV) air combat, optimization algorithms must perform exceptionally well to address the complexities of tactical environments and multidimensional control challenges. Thus, developing strategies to enhance exploration, improve stability, and capture the complex dynamics of combat trajectories is crucial. This study proposes a reinforcement learning approach based on Proximal Policy Optimization (PPO), which integrates Covariance Matrix Adaptation Strategy (CMAS) and Value-Conditional Hidden State Entropy (VCHSE) to optimize UAV air combat strategies. This method effectively addresses intrinsic correlations and enhances the exploration potential of control inputs. We designed a Representation Network (RepNet) to capture dynamic changes during combat, providing guidance for trajectory strategies. Additionally, we employed VCHSE to measure and promote state diversity, ensuring effective exploration of unknown areas and avoiding local optima. To enhance adaptability and stability, we introduced CMAS that dynamically adjusts the covariance matrix, improving action correlations in high-dimensional spaces and increasing decision-making efficiency. In rigorous tests within a simulated air combat environment, our method achieved an Elo score improvement of nearly 4600 points compared to the PPO algorithm, representing a 41.2% increase after 60 million training steps. An implementation of our algorithm under the CleanRL framework is available at https://github.com/zzfhmy/PPO-CmaVH.
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2025.3533136