BirdNet+: Two-Stage 3D Object Detection in LiDAR Through a Sparsity-Invariant Bird's Eye View

Autonomous navigation relies upon an accurate understanding of the elements in the surroundings. Among the different on-board perception tasks, 3D object detection allows the identification of dynamic objects that cannot be registered by maps, being key for safe navigation. Thus, it often requires t...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE access 2021, Vol.9, p.160299-160316
Hauptverfasser:	Barrera, Alejandro, Beltran, Jorge, Guindel, Carlos, Iglesias, Jose Antonio, Garcia, Fernando
Format:	Artikel
Sprache:	eng
Schlagworte:	autonomous driving Autonomous navigation Bird’s eye view (BEV) Boxes Detectors Elevation Feature extraction Laser radar LiDAR Object detection Object recognition Representations Sensors Task analysis Three-dimensional displays
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Autonomous navigation relies upon an accurate understanding of the elements in the surroundings. Among the different on-board perception tasks, 3D object detection allows the identification of dynamic objects that cannot be registered by maps, being key for safe navigation. Thus, it often requires the use of LiDAR data, which is able to faithfully represent the scene geometry. However, although raw laser point clouds contain rich features to perform object detection, more compact representations such as the bird's eye view (BEV) projection are usually preferred in order to meet the time requirements of the control loop. This paper presents an end-to-end object detection network based on the well-known Faster R-CNN architecture that uses BEV images as input to produce the final 3D boxes. Our regression branches can infer not only the axis-aligned bounding boxes but also the rotation angle, height, and elevation of the objects in the scene. The proposed network provides state-of-the-art results for car, pedestrian, and cyclist detection with a single forward pass when evaluated on the KITTI 3D Object Detection Benchmark, with an accuracy that exceeds 64% mAP 3D for the Moderate difficulty. Further experiments on the challenging nuScenes dataset show the generalizability of both the method and the proposed BEV representation against different LiDAR devices and across a wider set of object categories by being able to reach more than 30% mAP with a single LiDAR sweep and almost 40% mAP with the usual 10-sweep accumulation.
ISSN:	2169-3536 2169-3536
DOI:	10.1109/ACCESS.2021.3131389