NavRL: Learning Safe Flight in Dynamic Environments
Safe flight in dynamic environments requires autonomous unmanned aerial vehicles (UAVs) to make effective decisions when navigating cluttered spaces with moving obstacles. Traditional approaches often decompose decision-making into hierarchical modules for prediction and planning. Although these han...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Safe flight in dynamic environments requires autonomous unmanned aerial
vehicles (UAVs) to make effective decisions when navigating cluttered spaces
with moving obstacles. Traditional approaches often decompose decision-making
into hierarchical modules for prediction and planning. Although these
handcrafted systems can perform well in specific settings, they might fail if
environmental conditions change and often require careful parameter tuning.
Additionally, their solutions could be suboptimal due to the use of inaccurate
mathematical model assumptions and simplifications aimed at achieving
computational efficiency. To overcome these limitations, this paper introduces
the NavRL framework, a deep reinforcement learning-based navigation method
built on the Proximal Policy Optimization (PPO) algorithm. NavRL utilizes our
carefully designed state and action representations, allowing the learned
policy to make safe decisions in the presence of both static and dynamic
obstacles, with zero-shot transfer from simulation to real-world flight.
Furthermore, the proposed method adopts a simple but effective safety shield
for the trained policy, inspired by the concept of velocity obstacles, to
mitigate potential failures associated with the black-box nature of neural
networks. To accelerate the convergence, we implement the training pipeline
using NVIDIA Isaac Sim, enabling parallel training with thousands of
quadcopters. Simulation and physical experiments show that our method ensures
safe navigation in dynamic environments and results in the fewest collisions
compared to benchmarks in scenarios with dynamic obstacles. |
---|---|
DOI: | 10.48550/arxiv.2409.15634 |