MHSAN: Multi-view hierarchical self-attention network for 3D shape recognition

Multi-view learning has demonstrated promising performance for 3D shape recognition. However, existing multi-view methods usually focus on fusing multiple views and ignore the structural and discriminative information carried by 2D views. In this paper, we propose a multi-view hierarchical self-atte...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Pattern recognition 2024-06, Vol.150, p.110315, Article 110315
Hauptverfasser: Cao, Jiangzhong, Yu, Lianggeng, Ling, Bingo Wing-Kuen, Yao, Zijie, Dai, Qingyun
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Multi-view learning has demonstrated promising performance for 3D shape recognition. However, existing multi-view methods usually focus on fusing multiple views and ignore the structural and discriminative information carried by 2D views. In this paper, we propose a multi-view hierarchical self-attention network (MHSAN) to explore the geometric and discriminative information from complex 2D views. Specifically, MHSAN consists of two self-attention networks. First, a global self-attention network is adopted to exploit the structure information by embedding position information of views. Then, the discriminative self-attention network learns discriminative information from the views with high classification scores. Through the proposed MHSAN, the geometric and discriminative information is condensed as the novel representation of 3D shapes. To validate the effectiveness of our proposed method, extensive experiments have been conducted on three 3D shape benchmarks. Experimental results demonstrate that our method is generally superior to the state-of-the-art methods in 3D shape classification and retrieval tasks. •A hierarchical self-attention network for 3D shape recognition is proposed.•The global information and the discriminative information are extracted.•Position information helps to learn the structure information of 3D object.•The views with high discriminative scores are given special attention.
ISSN:0031-3203
1873-5142
DOI:10.1016/j.patcog.2024.110315