Voxel-CIM: An Efficient Compute-in-Memory Accelerator for Voxel-based Point Cloud Neural Networks
The 3D point cloud perception has emerged as a fundamental role for a wide range of applications. In particular, with the rapid development of neural networks, the voxel-based networks attract great attention due to their excellent performance. Various accelerator designs have been proposed to impro...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The 3D point cloud perception has emerged as a fundamental role for a wide
range of applications. In particular, with the rapid development of neural
networks, the voxel-based networks attract great attention due to their
excellent performance. Various accelerator designs have been proposed to
improve the hardware performance of voxel-based networks, especially to speed
up the map search process. However, several challenges still exist including:
(1) massive off-chip data access volume caused by map search operations,
notably for high resolution and dense distribution cases, (2) frequent data
movement for data-intensive convolution operations, (3) imbalanced workload
caused by irregular sparsity of point data.
To address the above challenges, we propose Voxel-CIM, an efficient
Compute-in-Memory based accelerator for voxel-based neural network processing.
To reduce off-chip memory access for map search, a depth-encoding-based output
major search approach is introduced to maximize data reuse, achieving stable
$O(N)$-level data access volume in various situations. Voxel-CIM also employs
the in-memory computing paradigm and designs innovative weight mapping
strategies to efficiently process Sparse 3D convolutions and 2D convolutions.
Implemented on 22 nm technology and evaluated on representative benchmarks, the
Voxel-CIM achieves averagely 4.5~7.0$\times$ higher energy efficiency (10.8
TOPS/w), and 2.4~5.4$\times$ speed up in detection task and 1.2~8.1$\times$
speed up in segmentation task compared to the state-of-the-art point cloud
accelerators and powerful GPUs. |
---|---|
DOI: | 10.48550/arxiv.2409.19077 |