Enhancing 2D Representation Learning with a 3D Prior

Learning robust and effective representations of visual data is a fundamental task in computer vision. Traditionally, this is achieved by training models with labeled data which can be expensive to obtain. Self-supervised learning attempts to circumvent the requirement for labeled data by learning r...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Aygün, Mehmet, Dhar, Prithviraj, Yan, Zhicheng, Mac Aodha, Oisin, Ranjan, Rakesh
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Aygün, Mehmet
Dhar, Prithviraj
Yan, Zhicheng
Mac Aodha, Oisin
Ranjan, Rakesh
description Learning robust and effective representations of visual data is a fundamental task in computer vision. Traditionally, this is achieved by training models with labeled data which can be expensive to obtain. Self-supervised learning attempts to circumvent the requirement for labeled data by learning representations from raw unlabeled visual data alone. However, unlike humans who obtain rich 3D information from their binocular vision and through motion, the majority of current self-supervised methods are tasked with learning from monocular 2D image collections. This is noteworthy as it has been demonstrated that shape-centric visual processing is more robust compared to texture-biased automated methods. Inspired by this, we propose a new approach for strengthening existing self-supervised methods by explicitly enforcing a strong 3D structural prior directly into the model during training. Through experiments, across a range of datasets, we demonstrate that our 3D aware representations are more robust compared to conventional self-supervised baselines.
doi_str_mv 10.48550/arxiv.2406.02535
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2406_02535</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2406_02535</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2406_025353</originalsourceid><addsrcrecordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjEw0zMwMjU25WQwcc3LSMxLzsxLVzByUQhKLShKLU7NK0ksyczPU_BJTSzKA0mVZ5ZkKCQqGLsoBBRl5hfxMLCmJeYUp_JCaW4GeTfXEGcPXbD58QVFmbmJRZXxIHviwfYYE1YBAMSfMT0</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Enhancing 2D Representation Learning with a 3D Prior</title><source>arXiv.org</source><creator>Aygün, Mehmet ; Dhar, Prithviraj ; Yan, Zhicheng ; Mac Aodha, Oisin ; Ranjan, Rakesh</creator><creatorcontrib>Aygün, Mehmet ; Dhar, Prithviraj ; Yan, Zhicheng ; Mac Aodha, Oisin ; Ranjan, Rakesh</creatorcontrib><description>Learning robust and effective representations of visual data is a fundamental task in computer vision. Traditionally, this is achieved by training models with labeled data which can be expensive to obtain. Self-supervised learning attempts to circumvent the requirement for labeled data by learning representations from raw unlabeled visual data alone. However, unlike humans who obtain rich 3D information from their binocular vision and through motion, the majority of current self-supervised methods are tasked with learning from monocular 2D image collections. This is noteworthy as it has been demonstrated that shape-centric visual processing is more robust compared to texture-biased automated methods. Inspired by this, we propose a new approach for strengthening existing self-supervised methods by explicitly enforcing a strong 3D structural prior directly into the model during training. Through experiments, across a range of datasets, we demonstrate that our 3D aware representations are more robust compared to conventional self-supervised baselines.</description><identifier>DOI: 10.48550/arxiv.2406.02535</identifier><language>eng</language><subject>Computer Science - Computer Vision and Pattern Recognition</subject><creationdate>2024-06</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2406.02535$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2406.02535$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Aygün, Mehmet</creatorcontrib><creatorcontrib>Dhar, Prithviraj</creatorcontrib><creatorcontrib>Yan, Zhicheng</creatorcontrib><creatorcontrib>Mac Aodha, Oisin</creatorcontrib><creatorcontrib>Ranjan, Rakesh</creatorcontrib><title>Enhancing 2D Representation Learning with a 3D Prior</title><description>Learning robust and effective representations of visual data is a fundamental task in computer vision. Traditionally, this is achieved by training models with labeled data which can be expensive to obtain. Self-supervised learning attempts to circumvent the requirement for labeled data by learning representations from raw unlabeled visual data alone. However, unlike humans who obtain rich 3D information from their binocular vision and through motion, the majority of current self-supervised methods are tasked with learning from monocular 2D image collections. This is noteworthy as it has been demonstrated that shape-centric visual processing is more robust compared to texture-biased automated methods. Inspired by this, we propose a new approach for strengthening existing self-supervised methods by explicitly enforcing a strong 3D structural prior directly into the model during training. Through experiments, across a range of datasets, we demonstrate that our 3D aware representations are more robust compared to conventional self-supervised baselines.</description><subject>Computer Science - Computer Vision and Pattern Recognition</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjEw0zMwMjU25WQwcc3LSMxLzsxLVzByUQhKLShKLU7NK0ksyczPU_BJTSzKA0mVZ5ZkKCQqGLsoBBRl5hfxMLCmJeYUp_JCaW4GeTfXEGcPXbD58QVFmbmJRZXxIHviwfYYE1YBAMSfMT0</recordid><startdate>20240604</startdate><enddate>20240604</enddate><creator>Aygün, Mehmet</creator><creator>Dhar, Prithviraj</creator><creator>Yan, Zhicheng</creator><creator>Mac Aodha, Oisin</creator><creator>Ranjan, Rakesh</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240604</creationdate><title>Enhancing 2D Representation Learning with a 3D Prior</title><author>Aygün, Mehmet ; Dhar, Prithviraj ; Yan, Zhicheng ; Mac Aodha, Oisin ; Ranjan, Rakesh</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2406_025353</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Computer Vision and Pattern Recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>Aygün, Mehmet</creatorcontrib><creatorcontrib>Dhar, Prithviraj</creatorcontrib><creatorcontrib>Yan, Zhicheng</creatorcontrib><creatorcontrib>Mac Aodha, Oisin</creatorcontrib><creatorcontrib>Ranjan, Rakesh</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Aygün, Mehmet</au><au>Dhar, Prithviraj</au><au>Yan, Zhicheng</au><au>Mac Aodha, Oisin</au><au>Ranjan, Rakesh</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Enhancing 2D Representation Learning with a 3D Prior</atitle><date>2024-06-04</date><risdate>2024</risdate><abstract>Learning robust and effective representations of visual data is a fundamental task in computer vision. Traditionally, this is achieved by training models with labeled data which can be expensive to obtain. Self-supervised learning attempts to circumvent the requirement for labeled data by learning representations from raw unlabeled visual data alone. However, unlike humans who obtain rich 3D information from their binocular vision and through motion, the majority of current self-supervised methods are tasked with learning from monocular 2D image collections. This is noteworthy as it has been demonstrated that shape-centric visual processing is more robust compared to texture-biased automated methods. Inspired by this, we propose a new approach for strengthening existing self-supervised methods by explicitly enforcing a strong 3D structural prior directly into the model during training. Through experiments, across a range of datasets, we demonstrate that our 3D aware representations are more robust compared to conventional self-supervised baselines.</abstract><doi>10.48550/arxiv.2406.02535</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2406.02535
ispartof
issn
language eng
recordid cdi_arxiv_primary_2406_02535
source arXiv.org
subjects Computer Science - Computer Vision and Pattern Recognition
title Enhancing 2D Representation Learning with a 3D Prior
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-26T05%3A23%3A26IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Enhancing%202D%20Representation%20Learning%20with%20a%203D%20Prior&rft.au=Ayg%C3%BCn,%20Mehmet&rft.date=2024-06-04&rft_id=info:doi/10.48550/arxiv.2406.02535&rft_dat=%3Carxiv_GOX%3E2406_02535%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true