Multi-UAV Collaborative Path Planning using Hierarchical Reinforcement Learning and Simulated Annealing

In practice, classical path optimization algorithms performs poorly when applied to an unknown environment, swarm intelligence algorithms need further improvement in agility and accuracy to avoid a moving object in dynamic environment, and reinforcement learning algorithm, a usual solution adopted i...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	International journal of performability engineering 2022-07, Vol.18 (7), p.463
Hauptverfasser:	Yuting, Cheng, Dongcheng, Li, W. Eric, Wong, Man, Zhao, Dengfeng, Mo
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Grid method Machine learning Optimization Optimization algorithms Path planning Simulated annealing Simulation Swarm intelligence Two dimensional analysis Unknown environments Unmanned aerial vehicles
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In practice, classical path optimization algorithms performs poorly when applied to an unknown environment, swarm intelligence algorithms need further improvement in agility and accuracy to avoid a moving object in dynamic environment, and reinforcement learning algorithm, a usual solution adopted in machine learning, may give rise to curse of dimensionality due to the complexity of scenario. In view of aforesaid practical problems, this paper proposes using MAXQ hierarchical reinforcement learning method to achieve dimensionality reduction by abstraction and combining leader-wingman approach with dynamic dead zone to model after cooperative formation and design triangular form. A novel algorithm based on MAXQ and simulated annealing is designed to solve unmanned aerial vehicle (UAV) path planning problem, which accomplishes grid method-based path planning simulation in 2D scenarios. A comparative analysis is performed on Q-Learning, ε-Q-Learning, standard MAXQ and SA-MAXQ algorithms in terms of their convergence, time consumption and search steps. Moreover, leader-wingman method is combined with dynamic dead zone in modelling triangular form for Multi-UAV collaborative formation. The experimental results indicate SA-MAXQ algorithm yields quicker astringence, lower volatility, better learning effect, less time consumed and optimized searched route.
ISSN:	0973-1318
DOI:	10.23940/ijpe.22.07.p1.463474