3D Object Representation Learning: A Set-to-Set Matching Perspective

In this paper, we tackle the 3D object representation learning from the perspective of set-to-set matching. Given two 3D objects, calculating their similarity is formulated as the problem of set-to-set similarity measurement between two set of local patches. As local convolutional features from conv...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on image processing 2021, Vol.30, p.2168-2179
Hauptverfasser:	Yu, Tan, Meng, Jingjing, Yang, Ming, Yuan, Junsong
Format:	Artikel
Sprache:	eng
Schlagworte:	3D object recognition bilinear pooling convolutional neural network Convolutional neural networks Feature extraction Feature maps Kernel Learning Matching Neural networks Object recognition Patches (structures) Representations Similarity Spirits Three-dimensional displays Two dimensional displays
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In this paper, we tackle the 3D object representation learning from the perspective of set-to-set matching. Given two 3D objects, calculating their similarity is formulated as the problem of set-to-set similarity measurement between two set of local patches. As local convolutional features from convolutional feature maps are natural representations of local patches, the set-to-set matching between sets of local patches is further converted into a local features pooling problem. To highlight good matchings and suppress the bad ones, we exploit two pooling methods: 1) bilinear pooling and 2) VLAD pooling. We analyze their effectiveness in enhancing the set-to-set matching and meanwhile establish their connection. Moreover, to balance different components inherent in a bilinear-pooled feature, we propose the harmonized bilinear pooling operation, which follows the spirits of intra-normalization used in VLAD pooling. To achieve an end-to-end trainable framework, we implement the proposed harmonized bilinear pooling and intra-normalized VLAD as two layers to construct two types of neural network, multi-view harmonized bilinear network (MHBN) and multi-view VLAD network (MVLADN). Systematic experiments conducted on two public benchmark datasets demonstrate the efficacy of the proposed MHBN and MVLADN in 3D object recognition.
ISSN:	1057-7149 1941-0042
DOI:	10.1109/TIP.2021.3049968