3D Object Detection With Multi-Frame RGB-Lidar Feature Alignment

Single-frame 3D detection is a well-studied vision problem with dedicated benchmarks and a large body of work. This knowledge has translated to a myriad of real-world applications. However, frame-by-frame detection suffers from inconsistencies between independent frames, such as flickering bounding...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE access 2021, Vol.9, p.143138-143149
Hauptverfasser:	Ercelik, Emec, Yurtsever, Ekim, Knoll, Alois
Format:	Artikel
Sprache:	eng
Schlagworte:	3D object detection Ablation Alignment Detectors Feature extraction Frames Frustums Fuses KITTI Laser radar LiDAR multi-frame fusion Object detection Object recognition Radar tracking Safety critical Three-dimensional displays Tracking
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Single-frame 3D detection is a well-studied vision problem with dedicated benchmarks and a large body of work. This knowledge has translated to a myriad of real-world applications. However, frame-by-frame detection suffers from inconsistencies between independent frames, such as flickering bounding box shape and occasional misdetections. Safety-critical applications may not tolerate these inconsistencies. For example, automated driving systems require robust and temporally consistent detection output for planning. A vehicle's 3D bounding box shape should not change dramatically across independent frames. Against this backdrop, we propose a multi-frame RGB-Lidar feature alignment strategy to refine and increase the temporal consistency of 3D detection outputs. Our main contribution is aligning and aggregating object-level features using multiple past frames to improve 3D detection quality in the inference frame. First, a Frustum PointNet architecture extracts a frustum-cropped point cloud using RGB and lidar data for each object frame-by-frame. After tracking, multi-frame frustum features of unique objects are fused through a Gated Recurrent Unit (GRU) to obtain a refined 3D box shape and orientation. The proposed method improves 3D detection performance on the KITTI tracking dataset by more than 4% for all classes compared to the vanilla Frustum PointNet baseline. We also conducted extensive ablation studies to show the efficacy of our hyperparameter selections. Codes are available at https://github.com/emecercelik/Multi-frame-3D-detection.git .
ISSN:	2169-3536 2169-3536
DOI:	10.1109/ACCESS.2021.3120261