An Anchor-Free Vehicle Detection Algorithm in Aerial Image Based on Context Information and Transformer

Vehicle detection in the aerial image is an essential and challenging task widely used in industry and agriculture. Deep learning technology has recently achieved rapid development and good object detection results. However, the background of aerial images is complex; targets are densely distributed...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE geoscience and remote sensing letters 2022, Vol.19, p.1-5
Hauptverfasser:	Zhou, Wangcheng, Shen, Jiaquan, Liu, Ningzhong, Xia, Shifeng, Sun, Han
Format:	Artikel
Sprache:	eng
Schlagworte:	Agriculture Algorithms Anchor-free attention mechanism Datasets Deep learning Detection dynamic activation function (DAF) Feature extraction Heuristic algorithms Imagery Machine learning Modules Object detection Object recognition Occlusion Semantics transformer Transformers Vehicle detection Vehicle dynamics
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Vehicle detection in the aerial image is an essential and challenging task widely used in industry and agriculture. Deep learning technology has recently achieved rapid development and good object detection results. However, the background of aerial images is complex; targets are densely distributed, and some of them are occluded. For densely distributed targets, we need to predict at each feature point. In the case of complex background and target occlusion, it is often difficult to determine whether a location contains a target if the model only focuses on the local information. Therefore, we need a global perspective and contextual information to help train the model. This letter proposes a new anchor-free small object detection algorithm, which improves feature extraction by fusing contextual semantic information. In addition, a dynamic activation function (DAF) is also used in our network, which helps us calculate the activation function value for each point from a global perspective. Moreover, we also use the channel attention module and the transformer as the spatial attention module to help the network efficiently obtain global information. We evaluate the effectiveness of our method on the public dataset DLR-3K and vehicle detection in aerial imagery dataset (VEDAI), and the average precision (AP) achieves 0.896 and 0.875.
ISSN:	1545-598X 1558-0571
DOI:	10.1109/LGRS.2022.3202186