NPE-DRL: Enhancing Perception Constrained Obstacle Avoidance with Non-Expert Policy Guided Reinforcement Learning

Obstacle avoidance under constrained visual perception presents a significant challenge, requiring rapid detection and decision-making within partially observable environments, particularly for unmanned aerial vehicles (UAVs) maneuvering agilely in three-dimensional space. Compared to traditional me...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on artificial intelligence 2024-09, p.1-15
Hauptverfasser: Zhang, Yuhang, Yan, Chao, Xiao, Jiaping, Feroskhan, Mir
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Obstacle avoidance under constrained visual perception presents a significant challenge, requiring rapid detection and decision-making within partially observable environments, particularly for unmanned aerial vehicles (UAVs) maneuvering agilely in three-dimensional space. Compared to traditional methods, obstacle avoidance algorithms based on deep reinforcement learning (DRL) offer a better comprehension of the uncertain operational environment in an end-to-end manner, reducing computational complexity and enhancing flexibility and scalability. However, the inherent trial-and-error learning mechanism of DRL necessitates numerous iterations for policy convergence, leading to sample inefficiency issues. Meanwhile, existing sample-efficient obstacle avoidance approaches that leverage imitation learning often heavily rely on offline expert demonstrations, which are not always feasible in hazardous environments. To address these challenges, we propose a novel obstacle avoidance approach based on Non-Expert Policy Enhanced DRL (NPE-DRL). This approach integrates a fundamental DRL framework with prior knowledge derived from a non-expert policy-guided imitation learning. During the training phase, the agent starts by online imitating the actions generated by the non-expert policy during interactions and progressively shifts toward autonomously exploring the environment to generate the optimal policy. Both simulation and physical experiments validate that our approach improves sample efficiency and achieves a better exploration-exploitation balance in both virtual and real-world flights. Additionally, our NPE-DRL-based obstacle avoidance approach shows better adaptability in complex environments characterized by larger scales and denser obstacle configurations, demonstrating a significant improvement in UAVs' obstacle avoidance capability. Code available at https://github.com/zzzzzyh111/NonExpert-Guided-Visual-UAV-Navigation-Gazebo .
ISSN:2691-4581
DOI:10.1109/TAI.2024.3464510