To the Point: Efficient 3D Object Detection in the Range Image with Graph Convolution Kernels
CVPR 2021 3D object detection is vital for many robotics applications. For tasks where a 2D perspective range image exists, we propose to learn a 3D representation directly from this range image view. To this end, we designed a 2D convolutional network architecture that carries the 3D spherical coor...
Gespeichert in:
Hauptverfasser: | , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | CVPR 2021 3D object detection is vital for many robotics applications. For tasks where
a 2D perspective range image exists, we propose to learn a 3D representation
directly from this range image view. To this end, we designed a 2D
convolutional network architecture that carries the 3D spherical coordinates of
each pixel throughout the network. Its layers can consume any arbitrary
convolution kernel in place of the default inner product kernel and exploit the
underlying local geometry around each pixel. We outline four such kernels: a
dense kernel according to the bag-of-words paradigm, and three graph kernels
inspired by recent graph neural network advances: the Transformer, the
PointNet, and the Edge Convolution. We also explore cross-modality fusion with
the camera image, facilitated by operating in the perspective range image view.
Our method performs competitively on the Waymo Open Dataset and improves the
state-of-the-art AP for pedestrian detection from 69.7% to 75.5%. It is also
efficient in that our smallest model, which still outperforms the popular
PointPillars in quality, requires 180 times fewer FLOPS and model parameters |
---|---|
DOI: | 10.48550/arxiv.2106.13381 |