Self-Supervised Ranking for Representation Learning

We present a new framework for self-supervised representation learning by formulating it as a ranking problem in an image retrieval context on a large number of random views (augmentations) obtained from images. Our work is based on two intuitions: first, a good representation of images must yield a...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Varamesh, Ali, Diba, Ali, Tuytelaars, Tinne, Van Gool, Luc
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Computer Vision and Pattern Recognition
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Varamesh, Ali Diba, Ali Tuytelaars, Tinne Van Gool, Luc
description	We present a new framework for self-supervised representation learning by formulating it as a ranking problem in an image retrieval context on a large number of random views (augmentations) obtained from images. Our work is based on two intuitions: first, a good representation of images must yield a high-quality image ranking in a retrieval task; second, we would expect random views of an image to be ranked closer to a reference view of that image than random views of other images. Hence, we model representation learning as a learning to rank problem for image retrieval. We train a representation encoder by maximizing average precision (AP) for ranking, where random views of an image are considered positively related, and that of the other images considered negatives. The new framework, dubbed S2R2, enables computing a global objective on multiple views, compared to the local objective in the popular contrastive learning framework, which is calculated on pairs of views. In principle, by using a ranking criterion, we eliminate reliance on object-centric curated datasets. When trained on STL10 and MS-COCO, S2R2 outperforms SimCLR and the clustering-based contrastive learning model, SwAV, while being much simpler both conceptually and at implementation. On MS-COCO, S2R2 outperforms both SwAV and SimCLR with a larger margin than on STl10. This indicates that S2R2 is more effective on diverse scenes and could eliminate the need for an object-centric large training dataset for self-supervised representation learning.
doi_str_mv	10.48550/arxiv.2010.07258
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2010_07258</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2010_07258</sourcerecordid><originalsourceid>FETCH-LOGICAL-a678-7d759c74c1c3530590b2fded07c5e3f6ed3dc488f727464d15deb15316792a8b3</originalsourceid><addsrcrecordid>eNotzs1qAjEUBeBsXIj2AVx1XmA0f3duZilif2BAUPdDJrkpQY1DRqV9-1rr6sA5cPgYmwk-1waAL2z-jre55PeCowQzZmpHx1Durj3lWxzIF1ubDjF9FeGciy31mQZKF3uJ51Q0ZHO6b1M2CvY40MszJ2z_tt6vPspm8_65WjalrdCU6BFqh9oJp0BxqHkngyfP0QGpUJFX3mljAkrUlfYCPHUClKiwltZ0asJe_28f6rbP8WTzT_unbx969QsunT8I</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Self-Supervised Ranking for Representation Learning</title><source>arXiv.org</source><creator>Varamesh, Ali ; Diba, Ali ; Tuytelaars, Tinne ; Van Gool, Luc</creator><creatorcontrib>Varamesh, Ali ; Diba, Ali ; Tuytelaars, Tinne ; Van Gool, Luc</creatorcontrib><description>We present a new framework for self-supervised representation learning by formulating it as a ranking problem in an image retrieval context on a large number of random views (augmentations) obtained from images. Our work is based on two intuitions: first, a good representation of images must yield a high-quality image ranking in a retrieval task; second, we would expect random views of an image to be ranked closer to a reference view of that image than random views of other images. Hence, we model representation learning as a learning to rank problem for image retrieval. We train a representation encoder by maximizing average precision (AP) for ranking, where random views of an image are considered positively related, and that of the other images considered negatives. The new framework, dubbed S2R2, enables computing a global objective on multiple views, compared to the local objective in the popular contrastive learning framework, which is calculated on pairs of views. In principle, by using a ranking criterion, we eliminate reliance on object-centric curated datasets. When trained on STL10 and MS-COCO, S2R2 outperforms SimCLR and the clustering-based contrastive learning model, SwAV, while being much simpler both conceptually and at implementation. On MS-COCO, S2R2 outperforms both SwAV and SimCLR with a larger margin than on STl10. This indicates that S2R2 is more effective on diverse scenes and could eliminate the need for an object-centric large training dataset for self-supervised representation learning.</description><identifier>DOI: 10.48550/arxiv.2010.07258</identifier><language>eng</language><subject>Computer Science - Computer Vision and Pattern Recognition</subject><creationdate>2020-10</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2010.07258$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2010.07258$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Varamesh, Ali</creatorcontrib><creatorcontrib>Diba, Ali</creatorcontrib><creatorcontrib>Tuytelaars, Tinne</creatorcontrib><creatorcontrib>Van Gool, Luc</creatorcontrib><title>Self-Supervised Ranking for Representation Learning</title><description>We present a new framework for self-supervised representation learning by formulating it as a ranking problem in an image retrieval context on a large number of random views (augmentations) obtained from images. Our work is based on two intuitions: first, a good representation of images must yield a high-quality image ranking in a retrieval task; second, we would expect random views of an image to be ranked closer to a reference view of that image than random views of other images. Hence, we model representation learning as a learning to rank problem for image retrieval. We train a representation encoder by maximizing average precision (AP) for ranking, where random views of an image are considered positively related, and that of the other images considered negatives. The new framework, dubbed S2R2, enables computing a global objective on multiple views, compared to the local objective in the popular contrastive learning framework, which is calculated on pairs of views. In principle, by using a ranking criterion, we eliminate reliance on object-centric curated datasets. When trained on STL10 and MS-COCO, S2R2 outperforms SimCLR and the clustering-based contrastive learning model, SwAV, while being much simpler both conceptually and at implementation. On MS-COCO, S2R2 outperforms both SwAV and SimCLR with a larger margin than on STl10. This indicates that S2R2 is more effective on diverse scenes and could eliminate the need for an object-centric large training dataset for self-supervised representation learning.</description><subject>Computer Science - Computer Vision and Pattern Recognition</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotzs1qAjEUBeBsXIj2AVx1XmA0f3duZilif2BAUPdDJrkpQY1DRqV9-1rr6sA5cPgYmwk-1waAL2z-jre55PeCowQzZmpHx1Durj3lWxzIF1ubDjF9FeGciy31mQZKF3uJ51Q0ZHO6b1M2CvY40MszJ2z_tt6vPspm8_65WjalrdCU6BFqh9oJp0BxqHkngyfP0QGpUJFX3mljAkrUlfYCPHUClKiwltZ0asJe_28f6rbP8WTzT_unbx969QsunT8I</recordid><startdate>20201014</startdate><enddate>20201014</enddate><creator>Varamesh, Ali</creator><creator>Diba, Ali</creator><creator>Tuytelaars, Tinne</creator><creator>Van Gool, Luc</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20201014</creationdate><title>Self-Supervised Ranking for Representation Learning</title><author>Varamesh, Ali ; Diba, Ali ; Tuytelaars, Tinne ; Van Gool, Luc</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a678-7d759c74c1c3530590b2fded07c5e3f6ed3dc488f727464d15deb15316792a8b3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Computer Science - Computer Vision and Pattern Recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>Varamesh, Ali</creatorcontrib><creatorcontrib>Diba, Ali</creatorcontrib><creatorcontrib>Tuytelaars, Tinne</creatorcontrib><creatorcontrib>Van Gool, Luc</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Varamesh, Ali</au><au>Diba, Ali</au><au>Tuytelaars, Tinne</au><au>Van Gool, Luc</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Self-Supervised Ranking for Representation Learning</atitle><date>2020-10-14</date><risdate>2020</risdate><abstract>We present a new framework for self-supervised representation learning by formulating it as a ranking problem in an image retrieval context on a large number of random views (augmentations) obtained from images. Our work is based on two intuitions: first, a good representation of images must yield a high-quality image ranking in a retrieval task; second, we would expect random views of an image to be ranked closer to a reference view of that image than random views of other images. Hence, we model representation learning as a learning to rank problem for image retrieval. We train a representation encoder by maximizing average precision (AP) for ranking, where random views of an image are considered positively related, and that of the other images considered negatives. The new framework, dubbed S2R2, enables computing a global objective on multiple views, compared to the local objective in the popular contrastive learning framework, which is calculated on pairs of views. In principle, by using a ranking criterion, we eliminate reliance on object-centric curated datasets. When trained on STL10 and MS-COCO, S2R2 outperforms SimCLR and the clustering-based contrastive learning model, SwAV, while being much simpler both conceptually and at implementation. On MS-COCO, S2R2 outperforms both SwAV and SimCLR with a larger margin than on STl10. This indicates that S2R2 is more effective on diverse scenes and could eliminate the need for an object-centric large training dataset for self-supervised representation learning.</abstract><doi>10.48550/arxiv.2010.07258</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2010.07258
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2010_07258
source	arXiv.org
subjects	Computer Science - Computer Vision and Pattern Recognition
title	Self-Supervised Ranking for Representation Learning
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-30T18%3A35%3A45IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Self-Supervised%20Ranking%20for%20Representation%20Learning&rft.au=Varamesh,%20Ali&rft.date=2020-10-14&rft_id=info:doi/10.48550/arxiv.2010.07258&rft_dat=%3Carxiv_GOX%3E2010_07258%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true