Autonomous obstacle avoidance and target tracking of UAV: Transformer for observation sequence in reinforcement learning

Reinforcement learning (RL) is an effective approach to solve autonomous obstacle avoidance and target tracking for Unmanned Aerial Vehicle (UAV). However, due to communication interruptions or delays, transmission information loss often occurs in practical environments, which greatly reduces the su...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Knowledge-based systems 2024-04, Vol.290, p.111604, Article 111604
Hauptverfasser:	Jiang, Weilai, Cai, Tianqing, Xu, Guoqiang, Wang, Yaonan
Format:	Artikel
Sprache:	eng
Schlagworte:	Autonomous obstacle avoidance Deep reinforcement learning Target tracking Transformer Unmanned Aerial Vehicle (UAV)
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Reinforcement learning (RL) is an effective approach to solve autonomous obstacle avoidance and target tracking for Unmanned Aerial Vehicle (UAV). However, due to communication interruptions or delays, transmission information loss often occurs in practical environments, which greatly reduces the success rate of UAV tracking. Currently, most research methods focus on ideal environments where UAV can fully obtain target information, but there are significant limitations in engineering practice. To solve this problem, we first formalize the UAV as a continuous partially observable Markov decision process (POMDP) and consider the loss of target state information. Then, we propose a new algorithm, namely the Transformer for Observing Sequences in Reinforcement Learning (TOSRL), which uses a transformer encoder to process observation sequences from the past to the present, and further utilizes the transformer encoder to make decisions based on the extracted features. Compared with Recurrent Neural Networks (RNN) or Long Short Term Memory (LSTM), Transformer encoders not only perform parallel processing on inputs but also alleviate their long sequence forgetting defects. The experimental results show that TOSRL is significantly superior to state-of-the-art algorithms in many different scenarios, achieving the highest tracking rate and the lowest crash rate with the least number of parameters. [Display omitted] •Considering the issue of target information loss and tracking failure caused by communication interruption or delay in UAV tracking, it is more practical in engineering.•Using historical observations, UAV compensates for lost target states, enabling decision-making based on complete data for continuous and robust tracking.•The TOSRL framework proposes the use of a Transformer encoder to extract features from UAV observation sequences, mitigating memory loss caused by long sequences, ensuring robust tracking.•The experimental results show that TOSRL trained UAV can achieve the highest tracking rate and the lowest crash rate with the least number of parameters.
ISSN:	0950-7051 1872-7409
DOI:	10.1016/j.knosys.2024.111604