Demystifying Contrastive Self-Supervised Learning: Invariances, Augmentations and Dataset Biases

Self-supervised representation learning approaches have recently surpassed their supervised learning counterparts on downstream tasks like object detection and image classification. Somewhat mysteriously the recent gains in performance come from training instance classification models, treating each...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2020-07
Hauptverfasser:	Purushwalkam, Senthil, Gupta, Abhinav
Format:	Artikel
Sprache:	eng
Schlagworte:	Datasets Image classification Image segmentation Invariance Object recognition Occlusion Representations Self-supervised learning Supervised learning Training
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Purushwalkam, Senthil Gupta, Abhinav
description	Self-supervised representation learning approaches have recently surpassed their supervised learning counterparts on downstream tasks like object detection and image classification. Somewhat mysteriously the recent gains in performance come from training instance classification models, treating each image and it's augmented versions as samples of a single class. In this work, we first present quantitative experiments to demystify these gains. We demonstrate that approaches like MOCO and PIRL learn occlusion-invariant representations. However, they fail to capture viewpoint and category instance invariance which are crucial components for object recognition. Second, we demonstrate that these approaches obtain further gains from access to a clean object-centric training dataset like Imagenet. Finally, we propose an approach to leverage unstructured videos to learn representations that possess higher viewpoint invariance. Our results show that the learned representations outperform MOCOv2 trained on the same data in terms of invariances encoded and the performance on downstream image classification and semantic segmentation tasks.
format	Article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2428788078</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2428788078</sourcerecordid><originalsourceid>FETCH-proquest_journals_24287880783</originalsourceid><addsrcrecordid>eNqNikEKwjAQAIMgKNo_BLxaqElrgze1ioK3eq-LbiWiG82mhf7eHnyAp2GYGYix0noRm1SpkYiYH0mSqGWuskyPxaXAV8fB1p2lu9w6Ch56bVGW-Kzjsnmjby3jTZ4QPPXTSh6pBW-BrshzuW7uL6QAwTpiCXSTBQRgDHJje_BUDGt4MkY_TsRsvztvD_Hbu0-DHKqHazz1qVKpMrkxSW70f9cXTF9FeA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2428788078</pqid></control><display><type>article</type><title>Demystifying Contrastive Self-Supervised Learning: Invariances, Augmentations and Dataset Biases</title><source>Free E- Journals</source><creator>Purushwalkam, Senthil ; Gupta, Abhinav</creator><creatorcontrib>Purushwalkam, Senthil ; Gupta, Abhinav</creatorcontrib><description>Self-supervised representation learning approaches have recently surpassed their supervised learning counterparts on downstream tasks like object detection and image classification. Somewhat mysteriously the recent gains in performance come from training instance classification models, treating each image and it's augmented versions as samples of a single class. In this work, we first present quantitative experiments to demystify these gains. We demonstrate that approaches like MOCO and PIRL learn occlusion-invariant representations. However, they fail to capture viewpoint and category instance invariance which are crucial components for object recognition. Second, we demonstrate that these approaches obtain further gains from access to a clean object-centric training dataset like Imagenet. Finally, we propose an approach to leverage unstructured videos to learn representations that possess higher viewpoint invariance. Our results show that the learned representations outperform MOCOv2 trained on the same data in terms of invariances encoded and the performance on downstream image classification and semantic segmentation tasks.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Datasets ; Image classification ; Image segmentation ; Invariance ; Object recognition ; Occlusion ; Representations ; Self-supervised learning ; Supervised learning ; Training</subject><ispartof>arXiv.org, 2020-07</ispartof><rights>2020. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>780,784</link.rule.ids></links><search><creatorcontrib>Purushwalkam, Senthil</creatorcontrib><creatorcontrib>Gupta, Abhinav</creatorcontrib><title>Demystifying Contrastive Self-Supervised Learning: Invariances, Augmentations and Dataset Biases</title><title>arXiv.org</title><description>Self-supervised representation learning approaches have recently surpassed their supervised learning counterparts on downstream tasks like object detection and image classification. Somewhat mysteriously the recent gains in performance come from training instance classification models, treating each image and it's augmented versions as samples of a single class. In this work, we first present quantitative experiments to demystify these gains. We demonstrate that approaches like MOCO and PIRL learn occlusion-invariant representations. However, they fail to capture viewpoint and category instance invariance which are crucial components for object recognition. Second, we demonstrate that these approaches obtain further gains from access to a clean object-centric training dataset like Imagenet. Finally, we propose an approach to leverage unstructured videos to learn representations that possess higher viewpoint invariance. Our results show that the learned representations outperform MOCOv2 trained on the same data in terms of invariances encoded and the performance on downstream image classification and semantic segmentation tasks.</description><subject>Datasets</subject><subject>Image classification</subject><subject>Image segmentation</subject><subject>Invariance</subject><subject>Object recognition</subject><subject>Occlusion</subject><subject>Representations</subject><subject>Self-supervised learning</subject><subject>Supervised learning</subject><subject>Training</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNqNikEKwjAQAIMgKNo_BLxaqElrgze1ioK3eq-LbiWiG82mhf7eHnyAp2GYGYix0noRm1SpkYiYH0mSqGWuskyPxaXAV8fB1p2lu9w6Ch56bVGW-Kzjsnmjby3jTZ4QPPXTSh6pBW-BrshzuW7uL6QAwTpiCXSTBQRgDHJje_BUDGt4MkY_TsRsvztvD_Hbu0-DHKqHazz1qVKpMrkxSW70f9cXTF9FeA</recordid><startdate>20200729</startdate><enddate>20200729</enddate><creator>Purushwalkam, Senthil</creator><creator>Gupta, Abhinav</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20200729</creationdate><title>Demystifying Contrastive Self-Supervised Learning: Invariances, Augmentations and Dataset Biases</title><author>Purushwalkam, Senthil ; Gupta, Abhinav</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_24287880783</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Datasets</topic><topic>Image classification</topic><topic>Image segmentation</topic><topic>Invariance</topic><topic>Object recognition</topic><topic>Occlusion</topic><topic>Representations</topic><topic>Self-supervised learning</topic><topic>Supervised learning</topic><topic>Training</topic><toplevel>online_resources</toplevel><creatorcontrib>Purushwalkam, Senthil</creatorcontrib><creatorcontrib>Gupta, Abhinav</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Purushwalkam, Senthil</au><au>Gupta, Abhinav</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Demystifying Contrastive Self-Supervised Learning: Invariances, Augmentations and Dataset Biases</atitle><jtitle>arXiv.org</jtitle><date>2020-07-29</date><risdate>2020</risdate><eissn>2331-8422</eissn><abstract>Self-supervised representation learning approaches have recently surpassed their supervised learning counterparts on downstream tasks like object detection and image classification. Somewhat mysteriously the recent gains in performance come from training instance classification models, treating each image and it's augmented versions as samples of a single class. In this work, we first present quantitative experiments to demystify these gains. We demonstrate that approaches like MOCO and PIRL learn occlusion-invariant representations. However, they fail to capture viewpoint and category instance invariance which are crucial components for object recognition. Second, we demonstrate that these approaches obtain further gains from access to a clean object-centric training dataset like Imagenet. Finally, we propose an approach to leverage unstructured videos to learn representations that possess higher viewpoint invariance. Our results show that the learned representations outperform MOCOv2 trained on the same data in terms of invariances encoded and the performance on downstream image classification and semantic segmentation tasks.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2020-07
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_2428788078
source	Free E- Journals
subjects	Datasets Image classification Image segmentation Invariance Object recognition Occlusion Representations Self-supervised learning Supervised learning Training
title	Demystifying Contrastive Self-Supervised Learning: Invariances, Augmentations and Dataset Biases
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-24T02%3A32%3A22IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Demystifying%20Contrastive%20Self-Supervised%20Learning:%20Invariances,%20Augmentations%20and%20Dataset%20Biases&rft.jtitle=arXiv.org&rft.au=Purushwalkam,%20Senthil&rft.date=2020-07-29&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2428788078%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2428788078&rft_id=info:pmid/&rfr_iscdi=true