Contrast with Reconstruct: Contrastive 3D Representation Learning Guided by Generative Pretraining
Mainstream 3D representation learning approaches are built upon contrastive or generative modeling pretext tasks, where great improvements in performance on various downstream tasks have been achieved. However, we find these two paradigms have different characteristics: (i) contrastive models are da...
Gespeichert in:
Hauptverfasser: | , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Mainstream 3D representation learning approaches are built upon contrastive
or generative modeling pretext tasks, where great improvements in performance
on various downstream tasks have been achieved. However, we find these two
paradigms have different characteristics: (i) contrastive models are
data-hungry that suffer from a representation over-fitting issue; (ii)
generative models have a data filling issue that shows inferior data scaling
capacity compared to contrastive models. This motivates us to learn 3D
representations by sharing the merits of both paradigms, which is non-trivial
due to the pattern difference between the two paradigms. In this paper, we
propose Contrast with Reconstruct (ReCon) that unifies these two paradigms.
ReCon is trained to learn from both generative modeling teachers and
single/cross-modal contrastive teachers through ensemble distillation, where
the generative student guides the contrastive student. An encoder-decoder style
ReCon-block is proposed that transfers knowledge through cross attention with
stop-gradient, which avoids pretraining over-fitting and pattern difference
issues. ReCon achieves a new state-of-the-art in 3D representation learning,
e.g., 91.26% accuracy on ScanObjectNN. Codes have been released at
https://github.com/qizekun/ReCon. |
---|---|
DOI: | 10.48550/arxiv.2302.02318 |