Curriculumformer: Taming Curriculum Pre-Training for Enhanced 3-D Point Cloud Understanding

Learning universal representations of 3-D point clouds is essential for reducing the need for manual annotation of large-scale and irregular point cloud datasets. The current modus operandi for representative learning is self-supervised learning, which has shown great potential for improving point c...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transaction on neural networks and learning systems 2024-06, Vol.PP, p.1-15
Hauptverfasser:	Fei, Ben, Luo, Tianyue, Yang, Weidong, Liu, Liwen, Zhang, Rui, He, Ying
Format:	Artikel
Sprache:	eng
Schlagworte:	3-D representation learning Accuracy curriculum learning Data models Geometry Point cloud compression point clouds Representation learning self-supervised learning Task analysis transformer Transformers
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Learning universal representations of 3-D point clouds is essential for reducing the need for manual annotation of large-scale and irregular point cloud datasets. The current modus operandi for representative learning is self-supervised learning, which has shown great potential for improving point cloud understanding. Nevertheless, it remains an open problem how to employ auto-encoding for learning universal 3-D representations of irregularly structured point clouds, as previous methods focus on either global shapes or local geometries. To this end, we present a cascaded self-supervised point cloud representation learning framework, dubbed Curriculumformer, aiming to tame curriculum pre-training for enhanced point cloud understanding. Our main idea lies in devising a progressive pre-training strategy, which trains the Transformer in an easy-to-hard manner. Specifically, we first pre-train the Transformer using an upsampling strategy, which allows it to learn global information. Then, we follow up with a completion strategy, which enables the Transformer to gain insight into local geometries. Finally, we propose a M ulti- M odal M ulti- M odality C ontrastive L earning (M4CL) strategy to enhance the ability of representation learning by enriching the Transformer with semantic information. In this way, the pre-trained Transformer can be easily transferred to a wide range of downstream applications. We demonstrate the superior performance of Curriculumformer on various discriminant and generative tasks, outperforming state-of-the-art methods. Moreover, Curriculumformer can also be integrated into other off-the-shelf methods to promote their performance. Our code is available at https://github.com/Fayeben/Curriculumformer.
ISSN:	2162-237X 2162-2388 2162-2388
DOI:	10.1109/TNNLS.2024.3406587