Demystifying Contrastive Self-Supervised Learning: Invariances, Augmentations and Dataset Biases
Self-supervised representation learning approaches have recently surpassed their supervised learning counterparts on downstream tasks like object detection and image classification. Somewhat mysteriously the recent gains in performance come from training instance classification models, treating each...
Gespeichert in:
Veröffentlicht in: | arXiv.org 2020-07 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | arXiv.org |
container_volume | |
creator | Purushwalkam, Senthil Gupta, Abhinav |
description | Self-supervised representation learning approaches have recently surpassed their supervised learning counterparts on downstream tasks like object detection and image classification. Somewhat mysteriously the recent gains in performance come from training instance classification models, treating each image and it's augmented versions as samples of a single class. In this work, we first present quantitative experiments to demystify these gains. We demonstrate that approaches like MOCO and PIRL learn occlusion-invariant representations. However, they fail to capture viewpoint and category instance invariance which are crucial components for object recognition. Second, we demonstrate that these approaches obtain further gains from access to a clean object-centric training dataset like Imagenet. Finally, we propose an approach to leverage unstructured videos to learn representations that possess higher viewpoint invariance. Our results show that the learned representations outperform MOCOv2 trained on the same data in terms of invariances encoded and the performance on downstream image classification and semantic segmentation tasks. |
format | Article |
fullrecord | <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2428788078</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2428788078</sourcerecordid><originalsourceid>FETCH-proquest_journals_24287880783</originalsourceid><addsrcrecordid>eNqNikEKwjAQAIMgKNo_BLxaqElrgze1ioK3eq-LbiWiG82mhf7eHnyAp2GYGYix0noRm1SpkYiYH0mSqGWuskyPxaXAV8fB1p2lu9w6Ch56bVGW-Kzjsnmjby3jTZ4QPPXTSh6pBW-BrshzuW7uL6QAwTpiCXSTBQRgDHJje_BUDGt4MkY_TsRsvztvD_Hbu0-DHKqHazz1qVKpMrkxSW70f9cXTF9FeA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2428788078</pqid></control><display><type>article</type><title>Demystifying Contrastive Self-Supervised Learning: Invariances, Augmentations and Dataset Biases</title><source>Free E- Journals</source><creator>Purushwalkam, Senthil ; Gupta, Abhinav</creator><creatorcontrib>Purushwalkam, Senthil ; Gupta, Abhinav</creatorcontrib><description>Self-supervised representation learning approaches have recently surpassed their supervised learning counterparts on downstream tasks like object detection and image classification. Somewhat mysteriously the recent gains in performance come from training instance classification models, treating each image and it's augmented versions as samples of a single class. In this work, we first present quantitative experiments to demystify these gains. We demonstrate that approaches like MOCO and PIRL learn occlusion-invariant representations. However, they fail to capture viewpoint and category instance invariance which are crucial components for object recognition. Second, we demonstrate that these approaches obtain further gains from access to a clean object-centric training dataset like Imagenet. Finally, we propose an approach to leverage unstructured videos to learn representations that possess higher viewpoint invariance. Our results show that the learned representations outperform MOCOv2 trained on the same data in terms of invariances encoded and the performance on downstream image classification and semantic segmentation tasks.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Datasets ; Image classification ; Image segmentation ; Invariance ; Object recognition ; Occlusion ; Representations ; Self-supervised learning ; Supervised learning ; Training</subject><ispartof>arXiv.org, 2020-07</ispartof><rights>2020. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>780,784</link.rule.ids></links><search><creatorcontrib>Purushwalkam, Senthil</creatorcontrib><creatorcontrib>Gupta, Abhinav</creatorcontrib><title>Demystifying Contrastive Self-Supervised Learning: Invariances, Augmentations and Dataset Biases</title><title>arXiv.org</title><description>Self-supervised representation learning approaches have recently surpassed their supervised learning counterparts on downstream tasks like object detection and image classification. Somewhat mysteriously the recent gains in performance come from training instance classification models, treating each image and it's augmented versions as samples of a single class. In this work, we first present quantitative experiments to demystify these gains. We demonstrate that approaches like MOCO and PIRL learn occlusion-invariant representations. However, they fail to capture viewpoint and category instance invariance which are crucial components for object recognition. Second, we demonstrate that these approaches obtain further gains from access to a clean object-centric training dataset like Imagenet. Finally, we propose an approach to leverage unstructured videos to learn representations that possess higher viewpoint invariance. Our results show that the learned representations outperform MOCOv2 trained on the same data in terms of invariances encoded and the performance on downstream image classification and semantic segmentation tasks.</description><subject>Datasets</subject><subject>Image classification</subject><subject>Image segmentation</subject><subject>Invariance</subject><subject>Object recognition</subject><subject>Occlusion</subject><subject>Representations</subject><subject>Self-supervised learning</subject><subject>Supervised learning</subject><subject>Training</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNqNikEKwjAQAIMgKNo_BLxaqElrgze1ioK3eq-LbiWiG82mhf7eHnyAp2GYGYix0noRm1SpkYiYH0mSqGWuskyPxaXAV8fB1p2lu9w6Ch56bVGW-Kzjsnmjby3jTZ4QPPXTSh6pBW-BrshzuW7uL6QAwTpiCXSTBQRgDHJje_BUDGt4MkY_TsRsvztvD_Hbu0-DHKqHazz1qVKpMrkxSW70f9cXTF9FeA</recordid><startdate>20200729</startdate><enddate>20200729</enddate><creator>Purushwalkam, Senthil</creator><creator>Gupta, Abhinav</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20200729</creationdate><title>Demystifying Contrastive Self-Supervised Learning: Invariances, Augmentations and Dataset Biases</title><author>Purushwalkam, Senthil ; Gupta, Abhinav</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_24287880783</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Datasets</topic><topic>Image classification</topic><topic>Image segmentation</topic><topic>Invariance</topic><topic>Object recognition</topic><topic>Occlusion</topic><topic>Representations</topic><topic>Self-supervised learning</topic><topic>Supervised learning</topic><topic>Training</topic><toplevel>online_resources</toplevel><creatorcontrib>Purushwalkam, Senthil</creatorcontrib><creatorcontrib>Gupta, Abhinav</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Purushwalkam, Senthil</au><au>Gupta, Abhinav</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Demystifying Contrastive Self-Supervised Learning: Invariances, Augmentations and Dataset Biases</atitle><jtitle>arXiv.org</jtitle><date>2020-07-29</date><risdate>2020</risdate><eissn>2331-8422</eissn><abstract>Self-supervised representation learning approaches have recently surpassed their supervised learning counterparts on downstream tasks like object detection and image classification. Somewhat mysteriously the recent gains in performance come from training instance classification models, treating each image and it's augmented versions as samples of a single class. In this work, we first present quantitative experiments to demystify these gains. We demonstrate that approaches like MOCO and PIRL learn occlusion-invariant representations. However, they fail to capture viewpoint and category instance invariance which are crucial components for object recognition. Second, we demonstrate that these approaches obtain further gains from access to a clean object-centric training dataset like Imagenet. Finally, we propose an approach to leverage unstructured videos to learn representations that possess higher viewpoint invariance. Our results show that the learned representations outperform MOCOv2 trained on the same data in terms of invariances encoded and the performance on downstream image classification and semantic segmentation tasks.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | EISSN: 2331-8422 |
ispartof | arXiv.org, 2020-07 |
issn | 2331-8422 |
language | eng |
recordid | cdi_proquest_journals_2428788078 |
source | Free E- Journals |
subjects | Datasets Image classification Image segmentation Invariance Object recognition Occlusion Representations Self-supervised learning Supervised learning Training |
title | Demystifying Contrastive Self-Supervised Learning: Invariances, Augmentations and Dataset Biases |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-24T02%3A32%3A22IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Demystifying%20Contrastive%20Self-Supervised%20Learning:%20Invariances,%20Augmentations%20and%20Dataset%20Biases&rft.jtitle=arXiv.org&rft.au=Purushwalkam,%20Senthil&rft.date=2020-07-29&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2428788078%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2428788078&rft_id=info:pmid/&rfr_iscdi=true |