3D Object Detection With Multi-Frame RGB-Lidar Feature Alignment

Single-frame 3D detection is a well-studied vision problem with dedicated benchmarks and a large body of work. This knowledge has translated to a myriad of real-world applications. However, frame-by-frame detection suffers from inconsistencies between independent frames, such as flickering bounding...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE access 2021, Vol.9, p.143138-143149
Hauptverfasser: Ercelik, Emec, Yurtsever, Ekim, Knoll, Alois
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Single-frame 3D detection is a well-studied vision problem with dedicated benchmarks and a large body of work. This knowledge has translated to a myriad of real-world applications. However, frame-by-frame detection suffers from inconsistencies between independent frames, such as flickering bounding box shape and occasional misdetections. Safety-critical applications may not tolerate these inconsistencies. For example, automated driving systems require robust and temporally consistent detection output for planning. A vehicle's 3D bounding box shape should not change dramatically across independent frames. Against this backdrop, we propose a multi-frame RGB-Lidar feature alignment strategy to refine and increase the temporal consistency of 3D detection outputs. Our main contribution is aligning and aggregating object-level features using multiple past frames to improve 3D detection quality in the inference frame. First, a Frustum PointNet architecture extracts a frustum-cropped point cloud using RGB and lidar data for each object frame-by-frame. After tracking, multi-frame frustum features of unique objects are fused through a Gated Recurrent Unit (GRU) to obtain a refined 3D box shape and orientation. The proposed method improves 3D detection performance on the KITTI tracking dataset by more than 4% for all classes compared to the vanilla Frustum PointNet baseline. We also conducted extensive ablation studies to show the efficacy of our hyperparameter selections. Codes are available at https://github.com/emecercelik/Multi-frame-3D-detection.git .
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2021.3120261