A viewpoint-guided prototype network for 3D shape classification
Multi-view learning methods have achieved remarkable results in 3D shape recognition. However, most of them focus on the visual feature extraction and feature aggregation, while viewpoints (spatial positions of virtual cameras) for generating multiple views are often ignored. In this paper, we deepl...
Gespeichert in:
Veröffentlicht in: | Multimedia systems 2023-12, Vol.29 (6), p.3531-3547 |
---|---|
Hauptverfasser: | , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Multi-view learning methods have achieved remarkable results in 3D shape recognition. However, most of them focus on the visual feature extraction and feature aggregation, while viewpoints (spatial positions of virtual cameras) for generating multiple views are often ignored. In this paper, we deeply explore the correlation between viewpoints and shape descriptor, and propose a novel viewpoint-guided prototype learning network (VGP-Net). We introduce a prototype representation for each class, including viewpoint prototype and feature prototype. The viewpoint prototype is the average weight of each viewpoint learned from a small support set via Score Unit, and stored in a weight dictionary. Our VGP model self-adaptively learns the view-wise weights by dynamically assembling with the viewpoint prototypes in weight dictionary and performing element-wise operation via view pooling layer. Under the guidance of viewpoint prototype, important visual features are enhanced, while those negligible features are eliminated. These refined features are effectively fused to generate compact shape descriptor. All the shape descriptors are clustered in feature embedding space, and the cluster center represents the feature prototype of each class. The classification thus can be performed by searching the nearest distance to feature prototypes. To boost the learning process, we further present a multi-stream regularization mechanism in both feature space and viewpoint space. Extensive experiments demonstrate that our VGP-Net is efficient, and the learned deep features have stronger discrimination ability. Therefore, it can achieve better performance compared to state-of-the-art methods. |
---|---|
ISSN: | 0942-4962 1432-1882 |
DOI: | 10.1007/s00530-023-01177-9 |