Point Transformer V2: Grouped Vector Attention and Partition-based Pooling
As a pioneering work exploring transformer architecture for 3D point cloud understanding, Point Transformer achieves impressive results on multiple highly competitive benchmarks. In this work, we analyze the limitations of the Point Transformer and propose our powerful and efficient Point Transforme...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | As a pioneering work exploring transformer architecture for 3D point cloud
understanding, Point Transformer achieves impressive results on multiple highly
competitive benchmarks. In this work, we analyze the limitations of the Point
Transformer and propose our powerful and efficient Point Transformer V2 model
with novel designs that overcome the limitations of previous work. In
particular, we first propose group vector attention, which is more effective
than the previous version of vector attention. Inheriting the advantages of
both learnable weight encoding and multi-head attention, we present a highly
effective implementation of grouped vector attention with a novel grouped
weight encoding layer. We also strengthen the position information for
attention by an additional position encoding multiplier. Furthermore, we design
novel and lightweight partition-based pooling methods which enable better
spatial alignment and more efficient sampling. Extensive experiments show that
our model achieves better performance than its predecessor and achieves
state-of-the-art on several challenging 3D point cloud understanding
benchmarks, including 3D point cloud segmentation on ScanNet v2 and S3DIS and
3D point cloud classification on ModelNet40. Our code will be available at
https://github.com/Gofinge/PointTransformerV2. |
---|---|
DOI: | 10.48550/arxiv.2210.05666 |