mSODANet: A network for multi-scale object detection in aerial images using hierarchical dilated convolutions
•Proposed network for multi-scale object detection in aerial images using hierarchical dilated convolutions (mSODANet) is explored to detect the objects of various scales in the visual scene and capture the effective scene contextual information.•Bi-directional feature aggregation module (BFAM) is l...
Gespeichert in:
Veröffentlicht in: | Pattern recognition 2022-06, Vol.126, p.108548, Article 108548 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | •Proposed network for multi-scale object detection in aerial images using hierarchical dilated convolutions (mSODANet) is explored to detect the objects of various scales in the visual scene and capture the effective scene contextual information.•Bi-directional feature aggregation module (BFAM) is leveraged to incorporate dense multi-scale contextual features.•Proposed approach is demonstrated on three challenging aerial imagery datasets, namely, VisDrone2019, DOTA (OBB & HBB), and NWPU-VHR10.
[Display omitted]
The object detection in aerial images is one of the most commonly used tasks in the wide-range of computer vision applications. However, the object detection is more challenging due to the following issues: (a) the pixel occupancy vary among the different scales of objects, (b) the distribution of objects is not uniform in aerial images, (c) the appearance of an object varies with different view-points and illumination conditions, and (d) the number of objects, even though they belong to same type, vary across the images. To address these issues, we propose a novel network for multi-scale object detection in aerial images using hierarchical dilated convolutions, called as mSODANet. In particular, we probe hierarchical dilated network using parallel dilated convolutions to learn the contextual information of different types of objects at multiple scales and multiple field-of-views. The introduced hierarchical dilated network captures the visual information of aerial image more effectively and enhances the detection capability of the model. Further, the extensive experiments conducted on three challenging publicly available datasets, i.e., Visdrone2019, DOTA (OBB & HBB), NWPU VHR-10, demonstrate the effectiveness of the proposed mSODANet and achieve the state-of-the-art performance on all three datasets. |
---|---|
ISSN: | 0031-3203 1873-5142 |
DOI: | 10.1016/j.patcog.2022.108548 |