Enhancing 2D Representation Learning with a 3D Prior

Learning robust and effective representations of visual data is a fundamental task in computer vision. Traditionally, this is achieved by training models with labeled data which can be expensive to obtain. Self-supervised learning attempts to circumvent the requirement for labeled data by learning r...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Aygün, Mehmet, Dhar, Prithviraj, Yan, Zhicheng, Mac Aodha, Oisin, Ranjan, Rakesh
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Computer Vision and Pattern Recognition
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Aygün, Mehmet Dhar, Prithviraj Yan, Zhicheng Mac Aodha, Oisin Ranjan, Rakesh
description	Learning robust and effective representations of visual data is a fundamental task in computer vision. Traditionally, this is achieved by training models with labeled data which can be expensive to obtain. Self-supervised learning attempts to circumvent the requirement for labeled data by learning representations from raw unlabeled visual data alone. However, unlike humans who obtain rich 3D information from their binocular vision and through motion, the majority of current self-supervised methods are tasked with learning from monocular 2D image collections. This is noteworthy as it has been demonstrated that shape-centric visual processing is more robust compared to texture-biased automated methods. Inspired by this, we propose a new approach for strengthening existing self-supervised methods by explicitly enforcing a strong 3D structural prior directly into the model during training. Through experiments, across a range of datasets, we demonstrate that our 3D aware representations are more robust compared to conventional self-supervised baselines.
doi_str_mv	10.48550/arxiv.2406.02535
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2406_02535</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2406_02535</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2406_025353</originalsourceid><addsrcrecordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjEw0zMwMjU25WQwcc3LSMxLzsxLVzByUQhKLShKLU7NK0ksyczPU_BJTSzKA0mVZ5ZkKCQqGLsoBBRl5hfxMLCmJeYUp_JCaW4GeTfXEGcPXbD58QVFmbmJRZXxIHviwfYYE1YBAMSfMT0</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Enhancing 2D Representation Learning with a 3D Prior</title><source>arXiv.org</source><creator>Aygün, Mehmet ; Dhar, Prithviraj ; Yan, Zhicheng ; Mac Aodha, Oisin ; Ranjan, Rakesh</creator><creatorcontrib>Aygün, Mehmet ; Dhar, Prithviraj ; Yan, Zhicheng ; Mac Aodha, Oisin ; Ranjan, Rakesh</creatorcontrib><description>Learning robust and effective representations of visual data is a fundamental task in computer vision. Traditionally, this is achieved by training models with labeled data which can be expensive to obtain. Self-supervised learning attempts to circumvent the requirement for labeled data by learning representations from raw unlabeled visual data alone. However, unlike humans who obtain rich 3D information from their binocular vision and through motion, the majority of current self-supervised methods are tasked with learning from monocular 2D image collections. This is noteworthy as it has been demonstrated that shape-centric visual processing is more robust compared to texture-biased automated methods. Inspired by this, we propose a new approach for strengthening existing self-supervised methods by explicitly enforcing a strong 3D structural prior directly into the model during training. Through experiments, across a range of datasets, we demonstrate that our 3D aware representations are more robust compared to conventional self-supervised baselines.</description><identifier>DOI: 10.48550/arxiv.2406.02535</identifier><language>eng</language><subject>Computer Science - Computer Vision and Pattern Recognition</subject><creationdate>2024-06</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2406.02535$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2406.02535$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Aygün, Mehmet</creatorcontrib><creatorcontrib>Dhar, Prithviraj</creatorcontrib><creatorcontrib>Yan, Zhicheng</creatorcontrib><creatorcontrib>Mac Aodha, Oisin</creatorcontrib><creatorcontrib>Ranjan, Rakesh</creatorcontrib><title>Enhancing 2D Representation Learning with a 3D Prior</title><description>Learning robust and effective representations of visual data is a fundamental task in computer vision. Traditionally, this is achieved by training models with labeled data which can be expensive to obtain. Self-supervised learning attempts to circumvent the requirement for labeled data by learning representations from raw unlabeled visual data alone. However, unlike humans who obtain rich 3D information from their binocular vision and through motion, the majority of current self-supervised methods are tasked with learning from monocular 2D image collections. This is noteworthy as it has been demonstrated that shape-centric visual processing is more robust compared to texture-biased automated methods. Inspired by this, we propose a new approach for strengthening existing self-supervised methods by explicitly enforcing a strong 3D structural prior directly into the model during training. Through experiments, across a range of datasets, we demonstrate that our 3D aware representations are more robust compared to conventional self-supervised baselines.</description><subject>Computer Science - Computer Vision and Pattern Recognition</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjEw0zMwMjU25WQwcc3LSMxLzsxLVzByUQhKLShKLU7NK0ksyczPU_BJTSzKA0mVZ5ZkKCQqGLsoBBRl5hfxMLCmJeYUp_JCaW4GeTfXEGcPXbD58QVFmbmJRZXxIHviwfYYE1YBAMSfMT0</recordid><startdate>20240604</startdate><enddate>20240604</enddate><creator>Aygün, Mehmet</creator><creator>Dhar, Prithviraj</creator><creator>Yan, Zhicheng</creator><creator>Mac Aodha, Oisin</creator><creator>Ranjan, Rakesh</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240604</creationdate><title>Enhancing 2D Representation Learning with a 3D Prior</title><author>Aygün, Mehmet ; Dhar, Prithviraj ; Yan, Zhicheng ; Mac Aodha, Oisin ; Ranjan, Rakesh</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2406_025353</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Computer Vision and Pattern Recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>Aygün, Mehmet</creatorcontrib><creatorcontrib>Dhar, Prithviraj</creatorcontrib><creatorcontrib>Yan, Zhicheng</creatorcontrib><creatorcontrib>Mac Aodha, Oisin</creatorcontrib><creatorcontrib>Ranjan, Rakesh</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Aygün, Mehmet</au><au>Dhar, Prithviraj</au><au>Yan, Zhicheng</au><au>Mac Aodha, Oisin</au><au>Ranjan, Rakesh</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Enhancing 2D Representation Learning with a 3D Prior</atitle><date>2024-06-04</date><risdate>2024</risdate><abstract>Learning robust and effective representations of visual data is a fundamental task in computer vision. Traditionally, this is achieved by training models with labeled data which can be expensive to obtain. Self-supervised learning attempts to circumvent the requirement for labeled data by learning representations from raw unlabeled visual data alone. However, unlike humans who obtain rich 3D information from their binocular vision and through motion, the majority of current self-supervised methods are tasked with learning from monocular 2D image collections. This is noteworthy as it has been demonstrated that shape-centric visual processing is more robust compared to texture-biased automated methods. Inspired by this, we propose a new approach for strengthening existing self-supervised methods by explicitly enforcing a strong 3D structural prior directly into the model during training. Through experiments, across a range of datasets, we demonstrate that our 3D aware representations are more robust compared to conventional self-supervised baselines.</abstract><doi>10.48550/arxiv.2406.02535</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2406.02535
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2406_02535
source	arXiv.org
subjects	Computer Science - Computer Vision and Pattern Recognition
title	Enhancing 2D Representation Learning with a 3D Prior
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-26T05%3A23%3A26IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Enhancing%202D%20Representation%20Learning%20with%20a%203D%20Prior&rft.au=Ayg%C3%BCn,%20Mehmet&rft.date=2024-06-04&rft_id=info:doi/10.48550/arxiv.2406.02535&rft_dat=%3Carxiv_GOX%3E2406_02535%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true