Enhancing 2D Representation Learning with a 3D Prior
Learning robust and effective representations of visual data is a fundamental task in computer vision. Traditionally, this is achieved by training models with labeled data which can be expensive to obtain. Self-supervised learning attempts to circumvent the requirement for labeled data by learning r...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Learning robust and effective representations of visual data is a fundamental
task in computer vision. Traditionally, this is achieved by training models
with labeled data which can be expensive to obtain. Self-supervised learning
attempts to circumvent the requirement for labeled data by learning
representations from raw unlabeled visual data alone. However, unlike humans
who obtain rich 3D information from their binocular vision and through motion,
the majority of current self-supervised methods are tasked with learning from
monocular 2D image collections. This is noteworthy as it has been demonstrated
that shape-centric visual processing is more robust compared to texture-biased
automated methods. Inspired by this, we propose a new approach for
strengthening existing self-supervised methods by explicitly enforcing a strong
3D structural prior directly into the model during training. Through
experiments, across a range of datasets, we demonstrate that our 3D aware
representations are more robust compared to conventional self-supervised
baselines. |
---|---|
DOI: | 10.48550/arxiv.2406.02535 |