Demystifying Contrastive Self-Supervised Learning: Invariances, Augmentations and Dataset Biases

Self-supervised representation learning approaches have recently surpassed their supervised learning counterparts on downstream tasks like object detection and image classification. Somewhat mysteriously the recent gains in performance come from training instance classification models, treating each...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:arXiv.org 2020-07
Hauptverfasser: Purushwalkam, Senthil, Gupta, Abhinav
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Purushwalkam, Senthil
Gupta, Abhinav
description Self-supervised representation learning approaches have recently surpassed their supervised learning counterparts on downstream tasks like object detection and image classification. Somewhat mysteriously the recent gains in performance come from training instance classification models, treating each image and it's augmented versions as samples of a single class. In this work, we first present quantitative experiments to demystify these gains. We demonstrate that approaches like MOCO and PIRL learn occlusion-invariant representations. However, they fail to capture viewpoint and category instance invariance which are crucial components for object recognition. Second, we demonstrate that these approaches obtain further gains from access to a clean object-centric training dataset like Imagenet. Finally, we propose an approach to leverage unstructured videos to learn representations that possess higher viewpoint invariance. Our results show that the learned representations outperform MOCOv2 trained on the same data in terms of invariances encoded and the performance on downstream image classification and semantic segmentation tasks.
format Article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2428788078</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2428788078</sourcerecordid><originalsourceid>FETCH-proquest_journals_24287880783</originalsourceid><addsrcrecordid>eNqNikEKwjAQAIMgKNo_BLxaqElrgze1ioK3eq-LbiWiG82mhf7eHnyAp2GYGYix0noRm1SpkYiYH0mSqGWuskyPxaXAV8fB1p2lu9w6Ch56bVGW-Kzjsnmjby3jTZ4QPPXTSh6pBW-BrshzuW7uL6QAwTpiCXSTBQRgDHJje_BUDGt4MkY_TsRsvztvD_Hbu0-DHKqHazz1qVKpMrkxSW70f9cXTF9FeA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2428788078</pqid></control><display><type>article</type><title>Demystifying Contrastive Self-Supervised Learning: Invariances, Augmentations and Dataset Biases</title><source>Free E- Journals</source><creator>Purushwalkam, Senthil ; Gupta, Abhinav</creator><creatorcontrib>Purushwalkam, Senthil ; Gupta, Abhinav</creatorcontrib><description>Self-supervised representation learning approaches have recently surpassed their supervised learning counterparts on downstream tasks like object detection and image classification. Somewhat mysteriously the recent gains in performance come from training instance classification models, treating each image and it's augmented versions as samples of a single class. In this work, we first present quantitative experiments to demystify these gains. We demonstrate that approaches like MOCO and PIRL learn occlusion-invariant representations. However, they fail to capture viewpoint and category instance invariance which are crucial components for object recognition. Second, we demonstrate that these approaches obtain further gains from access to a clean object-centric training dataset like Imagenet. Finally, we propose an approach to leverage unstructured videos to learn representations that possess higher viewpoint invariance. Our results show that the learned representations outperform MOCOv2 trained on the same data in terms of invariances encoded and the performance on downstream image classification and semantic segmentation tasks.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Datasets ; Image classification ; Image segmentation ; Invariance ; Object recognition ; Occlusion ; Representations ; Self-supervised learning ; Supervised learning ; Training</subject><ispartof>arXiv.org, 2020-07</ispartof><rights>2020. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>780,784</link.rule.ids></links><search><creatorcontrib>Purushwalkam, Senthil</creatorcontrib><creatorcontrib>Gupta, Abhinav</creatorcontrib><title>Demystifying Contrastive Self-Supervised Learning: Invariances, Augmentations and Dataset Biases</title><title>arXiv.org</title><description>Self-supervised representation learning approaches have recently surpassed their supervised learning counterparts on downstream tasks like object detection and image classification. Somewhat mysteriously the recent gains in performance come from training instance classification models, treating each image and it's augmented versions as samples of a single class. In this work, we first present quantitative experiments to demystify these gains. We demonstrate that approaches like MOCO and PIRL learn occlusion-invariant representations. However, they fail to capture viewpoint and category instance invariance which are crucial components for object recognition. Second, we demonstrate that these approaches obtain further gains from access to a clean object-centric training dataset like Imagenet. Finally, we propose an approach to leverage unstructured videos to learn representations that possess higher viewpoint invariance. Our results show that the learned representations outperform MOCOv2 trained on the same data in terms of invariances encoded and the performance on downstream image classification and semantic segmentation tasks.</description><subject>Datasets</subject><subject>Image classification</subject><subject>Image segmentation</subject><subject>Invariance</subject><subject>Object recognition</subject><subject>Occlusion</subject><subject>Representations</subject><subject>Self-supervised learning</subject><subject>Supervised learning</subject><subject>Training</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNqNikEKwjAQAIMgKNo_BLxaqElrgze1ioK3eq-LbiWiG82mhf7eHnyAp2GYGYix0noRm1SpkYiYH0mSqGWuskyPxaXAV8fB1p2lu9w6Ch56bVGW-Kzjsnmjby3jTZ4QPPXTSh6pBW-BrshzuW7uL6QAwTpiCXSTBQRgDHJje_BUDGt4MkY_TsRsvztvD_Hbu0-DHKqHazz1qVKpMrkxSW70f9cXTF9FeA</recordid><startdate>20200729</startdate><enddate>20200729</enddate><creator>Purushwalkam, Senthil</creator><creator>Gupta, Abhinav</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20200729</creationdate><title>Demystifying Contrastive Self-Supervised Learning: Invariances, Augmentations and Dataset Biases</title><author>Purushwalkam, Senthil ; Gupta, Abhinav</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_24287880783</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Datasets</topic><topic>Image classification</topic><topic>Image segmentation</topic><topic>Invariance</topic><topic>Object recognition</topic><topic>Occlusion</topic><topic>Representations</topic><topic>Self-supervised learning</topic><topic>Supervised learning</topic><topic>Training</topic><toplevel>online_resources</toplevel><creatorcontrib>Purushwalkam, Senthil</creatorcontrib><creatorcontrib>Gupta, Abhinav</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Purushwalkam, Senthil</au><au>Gupta, Abhinav</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Demystifying Contrastive Self-Supervised Learning: Invariances, Augmentations and Dataset Biases</atitle><jtitle>arXiv.org</jtitle><date>2020-07-29</date><risdate>2020</risdate><eissn>2331-8422</eissn><abstract>Self-supervised representation learning approaches have recently surpassed their supervised learning counterparts on downstream tasks like object detection and image classification. Somewhat mysteriously the recent gains in performance come from training instance classification models, treating each image and it's augmented versions as samples of a single class. In this work, we first present quantitative experiments to demystify these gains. We demonstrate that approaches like MOCO and PIRL learn occlusion-invariant representations. However, they fail to capture viewpoint and category instance invariance which are crucial components for object recognition. Second, we demonstrate that these approaches obtain further gains from access to a clean object-centric training dataset like Imagenet. Finally, we propose an approach to leverage unstructured videos to learn representations that possess higher viewpoint invariance. Our results show that the learned representations outperform MOCOv2 trained on the same data in terms of invariances encoded and the performance on downstream image classification and semantic segmentation tasks.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2020-07
issn 2331-8422
language eng
recordid cdi_proquest_journals_2428788078
source Free E- Journals
subjects Datasets
Image classification
Image segmentation
Invariance
Object recognition
Occlusion
Representations
Self-supervised learning
Supervised learning
Training
title Demystifying Contrastive Self-Supervised Learning: Invariances, Augmentations and Dataset Biases
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-24T02%3A32%3A22IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Demystifying%20Contrastive%20Self-Supervised%20Learning:%20Invariances,%20Augmentations%20and%20Dataset%20Biases&rft.jtitle=arXiv.org&rft.au=Purushwalkam,%20Senthil&rft.date=2020-07-29&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2428788078%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2428788078&rft_id=info:pmid/&rfr_iscdi=true