Enhancing 2D Representation Learning with a 3D Prior
Learning robust and effective representations of visual data is a fundamental task in computer vision. Traditionally, this is achieved by training models with labeled data which can be expensive to obtain. Self-supervised learning attempts to circumvent the requirement for labeled data by learning r...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Aygün, Mehmet Dhar, Prithviraj Yan, Zhicheng Mac Aodha, Oisin Ranjan, Rakesh |
description | Learning robust and effective representations of visual data is a fundamental
task in computer vision. Traditionally, this is achieved by training models
with labeled data which can be expensive to obtain. Self-supervised learning
attempts to circumvent the requirement for labeled data by learning
representations from raw unlabeled visual data alone. However, unlike humans
who obtain rich 3D information from their binocular vision and through motion,
the majority of current self-supervised methods are tasked with learning from
monocular 2D image collections. This is noteworthy as it has been demonstrated
that shape-centric visual processing is more robust compared to texture-biased
automated methods. Inspired by this, we propose a new approach for
strengthening existing self-supervised methods by explicitly enforcing a strong
3D structural prior directly into the model during training. Through
experiments, across a range of datasets, we demonstrate that our 3D aware
representations are more robust compared to conventional self-supervised
baselines. |
doi_str_mv | 10.48550/arxiv.2406.02535 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2406_02535</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2406_02535</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2406_025353</originalsourceid><addsrcrecordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjEw0zMwMjU25WQwcc3LSMxLzsxLVzByUQhKLShKLU7NK0ksyczPU_BJTSzKA0mVZ5ZkKCQqGLsoBBRl5hfxMLCmJeYUp_JCaW4GeTfXEGcPXbD58QVFmbmJRZXxIHviwfYYE1YBAMSfMT0</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Enhancing 2D Representation Learning with a 3D Prior</title><source>arXiv.org</source><creator>Aygün, Mehmet ; Dhar, Prithviraj ; Yan, Zhicheng ; Mac Aodha, Oisin ; Ranjan, Rakesh</creator><creatorcontrib>Aygün, Mehmet ; Dhar, Prithviraj ; Yan, Zhicheng ; Mac Aodha, Oisin ; Ranjan, Rakesh</creatorcontrib><description>Learning robust and effective representations of visual data is a fundamental
task in computer vision. Traditionally, this is achieved by training models
with labeled data which can be expensive to obtain. Self-supervised learning
attempts to circumvent the requirement for labeled data by learning
representations from raw unlabeled visual data alone. However, unlike humans
who obtain rich 3D information from their binocular vision and through motion,
the majority of current self-supervised methods are tasked with learning from
monocular 2D image collections. This is noteworthy as it has been demonstrated
that shape-centric visual processing is more robust compared to texture-biased
automated methods. Inspired by this, we propose a new approach for
strengthening existing self-supervised methods by explicitly enforcing a strong
3D structural prior directly into the model during training. Through
experiments, across a range of datasets, we demonstrate that our 3D aware
representations are more robust compared to conventional self-supervised
baselines.</description><identifier>DOI: 10.48550/arxiv.2406.02535</identifier><language>eng</language><subject>Computer Science - Computer Vision and Pattern Recognition</subject><creationdate>2024-06</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2406.02535$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2406.02535$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Aygün, Mehmet</creatorcontrib><creatorcontrib>Dhar, Prithviraj</creatorcontrib><creatorcontrib>Yan, Zhicheng</creatorcontrib><creatorcontrib>Mac Aodha, Oisin</creatorcontrib><creatorcontrib>Ranjan, Rakesh</creatorcontrib><title>Enhancing 2D Representation Learning with a 3D Prior</title><description>Learning robust and effective representations of visual data is a fundamental
task in computer vision. Traditionally, this is achieved by training models
with labeled data which can be expensive to obtain. Self-supervised learning
attempts to circumvent the requirement for labeled data by learning
representations from raw unlabeled visual data alone. However, unlike humans
who obtain rich 3D information from their binocular vision and through motion,
the majority of current self-supervised methods are tasked with learning from
monocular 2D image collections. This is noteworthy as it has been demonstrated
that shape-centric visual processing is more robust compared to texture-biased
automated methods. Inspired by this, we propose a new approach for
strengthening existing self-supervised methods by explicitly enforcing a strong
3D structural prior directly into the model during training. Through
experiments, across a range of datasets, we demonstrate that our 3D aware
representations are more robust compared to conventional self-supervised
baselines.</description><subject>Computer Science - Computer Vision and Pattern Recognition</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjEw0zMwMjU25WQwcc3LSMxLzsxLVzByUQhKLShKLU7NK0ksyczPU_BJTSzKA0mVZ5ZkKCQqGLsoBBRl5hfxMLCmJeYUp_JCaW4GeTfXEGcPXbD58QVFmbmJRZXxIHviwfYYE1YBAMSfMT0</recordid><startdate>20240604</startdate><enddate>20240604</enddate><creator>Aygün, Mehmet</creator><creator>Dhar, Prithviraj</creator><creator>Yan, Zhicheng</creator><creator>Mac Aodha, Oisin</creator><creator>Ranjan, Rakesh</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240604</creationdate><title>Enhancing 2D Representation Learning with a 3D Prior</title><author>Aygün, Mehmet ; Dhar, Prithviraj ; Yan, Zhicheng ; Mac Aodha, Oisin ; Ranjan, Rakesh</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2406_025353</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Computer Vision and Pattern Recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>Aygün, Mehmet</creatorcontrib><creatorcontrib>Dhar, Prithviraj</creatorcontrib><creatorcontrib>Yan, Zhicheng</creatorcontrib><creatorcontrib>Mac Aodha, Oisin</creatorcontrib><creatorcontrib>Ranjan, Rakesh</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Aygün, Mehmet</au><au>Dhar, Prithviraj</au><au>Yan, Zhicheng</au><au>Mac Aodha, Oisin</au><au>Ranjan, Rakesh</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Enhancing 2D Representation Learning with a 3D Prior</atitle><date>2024-06-04</date><risdate>2024</risdate><abstract>Learning robust and effective representations of visual data is a fundamental
task in computer vision. Traditionally, this is achieved by training models
with labeled data which can be expensive to obtain. Self-supervised learning
attempts to circumvent the requirement for labeled data by learning
representations from raw unlabeled visual data alone. However, unlike humans
who obtain rich 3D information from their binocular vision and through motion,
the majority of current self-supervised methods are tasked with learning from
monocular 2D image collections. This is noteworthy as it has been demonstrated
that shape-centric visual processing is more robust compared to texture-biased
automated methods. Inspired by this, we propose a new approach for
strengthening existing self-supervised methods by explicitly enforcing a strong
3D structural prior directly into the model during training. Through
experiments, across a range of datasets, we demonstrate that our 3D aware
representations are more robust compared to conventional self-supervised
baselines.</abstract><doi>10.48550/arxiv.2406.02535</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2406.02535 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2406_02535 |
source | arXiv.org |
subjects | Computer Science - Computer Vision and Pattern Recognition |
title | Enhancing 2D Representation Learning with a 3D Prior |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-26T05%3A23%3A26IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Enhancing%202D%20Representation%20Learning%20with%20a%203D%20Prior&rft.au=Ayg%C3%BCn,%20Mehmet&rft.date=2024-06-04&rft_id=info:doi/10.48550/arxiv.2406.02535&rft_dat=%3Carxiv_GOX%3E2406_02535%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |