3D Object Detection With Multi-Frame RGB-Lidar Feature Alignment
Single-frame 3D detection is a well-studied vision problem with dedicated benchmarks and a large body of work. This knowledge has translated to a myriad of real-world applications. However, frame-by-frame detection suffers from inconsistencies between independent frames, such as flickering bounding...
Gespeichert in:
Veröffentlicht in: | IEEE access 2021, Vol.9, p.143138-143149 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Single-frame 3D detection is a well-studied vision problem with dedicated benchmarks and a large body of work. This knowledge has translated to a myriad of real-world applications. However, frame-by-frame detection suffers from inconsistencies between independent frames, such as flickering bounding box shape and occasional misdetections. Safety-critical applications may not tolerate these inconsistencies. For example, automated driving systems require robust and temporally consistent detection output for planning. A vehicle's 3D bounding box shape should not change dramatically across independent frames. Against this backdrop, we propose a multi-frame RGB-Lidar feature alignment strategy to refine and increase the temporal consistency of 3D detection outputs. Our main contribution is aligning and aggregating object-level features using multiple past frames to improve 3D detection quality in the inference frame. First, a Frustum PointNet architecture extracts a frustum-cropped point cloud using RGB and lidar data for each object frame-by-frame. After tracking, multi-frame frustum features of unique objects are fused through a Gated Recurrent Unit (GRU) to obtain a refined 3D box shape and orientation. The proposed method improves 3D detection performance on the KITTI tracking dataset by more than 4% for all classes compared to the vanilla Frustum PointNet baseline. We also conducted extensive ablation studies to show the efficacy of our hyperparameter selections. Codes are available at https://github.com/emecercelik/Multi-frame-3D-detection.git . |
---|---|
ISSN: | 2169-3536 2169-3536 |
DOI: | 10.1109/ACCESS.2021.3120261 |