Improve exploration in deep reinforcement learning for UAV path planning using state and action entropy

Despite being a widely adopted development framework for unmanned aerial vehicle (UAV), deep reinforcement learning is often considered sample inefficient. Particularly, UAV struggles to fully explore the state and action space in environments with sparse rewards. While some exploration algorithms h...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Measurement science & technology 2024-05, Vol.35 (5), p.56206
Hauptverfasser: Lv, Hui, Chen, Yadong, Li, Shibo, Zhu, Baolong, Li, Min
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Despite being a widely adopted development framework for unmanned aerial vehicle (UAV), deep reinforcement learning is often considered sample inefficient. Particularly, UAV struggles to fully explore the state and action space in environments with sparse rewards. While some exploration algorithms have been proposed to overcome the challenge of sparse rewards, they are not specifically tailored for UAV platform. Consequently, applying those algorithms to UAV path planning may lead to problems such as unstable training processes and neglect of action space comprehension, possibly causing negative impacts on the path planning results. To address the problem of sparse rewards in UAV path planning, we propose an information-theoretic exploration algorithm, Entropy Explorer (EE) , specifically for UAV platform. The proposed EE generates intrinsic rewards based on state entropy and action entropy to compensate for the scarcity of extrinsic rewards. To further improve sampling efficiency, a framework integrating EE and Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithms is proposed. Finally, the TD3-EE algorithm is tested in AirSim and compared against benchmarking algorithms. The simulation outcomes manifest that TD3-EE effectively stimulates the UAV to comprehensively explore both state and action spaces, thereby attaining superior performance compared to the benchmark algorithms in the realm of path planning.
ISSN:0957-0233
1361-6501
DOI:10.1088/1361-6501/ad2663