Unsupervised Manifold Linearizing and Clustering

We consider the problem of simultaneously clustering and learning a linear representation of data lying close to a union of low-dimensional manifolds, a fundamental task in machine learning and computer vision. When the manifolds are assumed to be linear subspaces, this reduces to the classical prob...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Ding, Tianjiao, Tong, Shengbang, Chan, Kwan Ho Ryan, Dai, Xili, Ma, Yi, Haeffele, Benjamin D
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Computer Vision and Pattern Recognition Computer Science - Learning
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Ding, Tianjiao Tong, Shengbang Chan, Kwan Ho Ryan Dai, Xili Ma, Yi Haeffele, Benjamin D
description	We consider the problem of simultaneously clustering and learning a linear representation of data lying close to a union of low-dimensional manifolds, a fundamental task in machine learning and computer vision. When the manifolds are assumed to be linear subspaces, this reduces to the classical problem of subspace clustering, which has been studied extensively over the past two decades. Unfortunately, many real-world datasets such as natural images can not be well approximated by linear subspaces. On the other hand, numerous works have attempted to learn an appropriate transformation of the data, such that data is mapped from a union of general non-linear manifolds to a union of linear subspaces (with points from the same manifold being mapped to the same subspace). However, many existing works have limitations such as assuming knowledge of the membership of samples to clusters, requiring high sampling density, or being shown theoretically to learn trivial representations. In this paper, we propose to optimize the Maximal Coding Rate Reduction metric with respect to both the data representation and a novel doubly stochastic cluster membership, inspired by state-of-the-art subspace clustering results. We give a parameterization of such a representation and membership, allowing efficient mini-batching and one-shot initialization. Experiments on CIFAR-10, -20, -100, and TinyImageNet-200 datasets show that the proposed method is much more accurate and scalable than state-of-the-art deep clustering methods, and further learns a latent linear representation of the data.
doi_str_mv	10.48550/arxiv.2301.01805
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2301_01805</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2301_01805</sourcerecordid><originalsourceid>FETCH-LOGICAL-a675-7f1f5815370255f60d17de5dac2966e01eeec457eeb9b01c27e29c4b44b4108e3</originalsourceid><addsrcrecordid>eNotzs2KwjAUBeBsZiGOD-DKvkDrvWlv0y6HMv5ABze6LmlzI4GakXQU9en9GzhwOJvDJ8QUIckKIpjrcHHnRKaACWABNBKw88PpyOHsBjbRj_bO_vYmqp1nHdzN-X2kvYmq_jT8cXjMT_FhdT_w5L_HYrv43laruN4s19VXHetcUawsWiqQUgWSyOZgUBkmoztZ5jkDMnOXkWJuyxawk4pl2WVt9ghCwelYzN63L3JzDO6gw7V50psXPb0DZK4-Jw</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Unsupervised Manifold Linearizing and Clustering</title><source>arXiv.org</source><creator>Ding, Tianjiao ; Tong, Shengbang ; Chan, Kwan Ho Ryan ; Dai, Xili ; Ma, Yi ; Haeffele, Benjamin D</creator><creatorcontrib>Ding, Tianjiao ; Tong, Shengbang ; Chan, Kwan Ho Ryan ; Dai, Xili ; Ma, Yi ; Haeffele, Benjamin D</creatorcontrib><description>We consider the problem of simultaneously clustering and learning a linear representation of data lying close to a union of low-dimensional manifolds, a fundamental task in machine learning and computer vision. When the manifolds are assumed to be linear subspaces, this reduces to the classical problem of subspace clustering, which has been studied extensively over the past two decades. Unfortunately, many real-world datasets such as natural images can not be well approximated by linear subspaces. On the other hand, numerous works have attempted to learn an appropriate transformation of the data, such that data is mapped from a union of general non-linear manifolds to a union of linear subspaces (with points from the same manifold being mapped to the same subspace). However, many existing works have limitations such as assuming knowledge of the membership of samples to clusters, requiring high sampling density, or being shown theoretically to learn trivial representations. In this paper, we propose to optimize the Maximal Coding Rate Reduction metric with respect to both the data representation and a novel doubly stochastic cluster membership, inspired by state-of-the-art subspace clustering results. We give a parameterization of such a representation and membership, allowing efficient mini-batching and one-shot initialization. Experiments on CIFAR-10, -20, -100, and TinyImageNet-200 datasets show that the proposed method is much more accurate and scalable than state-of-the-art deep clustering methods, and further learns a latent linear representation of the data.</description><identifier>DOI: 10.48550/arxiv.2301.01805</identifier><language>eng</language><subject>Computer Science - Computer Vision and Pattern Recognition ; Computer Science - Learning</subject><creationdate>2023-01</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2301.01805$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2301.01805$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Ding, Tianjiao</creatorcontrib><creatorcontrib>Tong, Shengbang</creatorcontrib><creatorcontrib>Chan, Kwan Ho Ryan</creatorcontrib><creatorcontrib>Dai, Xili</creatorcontrib><creatorcontrib>Ma, Yi</creatorcontrib><creatorcontrib>Haeffele, Benjamin D</creatorcontrib><title>Unsupervised Manifold Linearizing and Clustering</title><description>We consider the problem of simultaneously clustering and learning a linear representation of data lying close to a union of low-dimensional manifolds, a fundamental task in machine learning and computer vision. When the manifolds are assumed to be linear subspaces, this reduces to the classical problem of subspace clustering, which has been studied extensively over the past two decades. Unfortunately, many real-world datasets such as natural images can not be well approximated by linear subspaces. On the other hand, numerous works have attempted to learn an appropriate transformation of the data, such that data is mapped from a union of general non-linear manifolds to a union of linear subspaces (with points from the same manifold being mapped to the same subspace). However, many existing works have limitations such as assuming knowledge of the membership of samples to clusters, requiring high sampling density, or being shown theoretically to learn trivial representations. In this paper, we propose to optimize the Maximal Coding Rate Reduction metric with respect to both the data representation and a novel doubly stochastic cluster membership, inspired by state-of-the-art subspace clustering results. We give a parameterization of such a representation and membership, allowing efficient mini-batching and one-shot initialization. Experiments on CIFAR-10, -20, -100, and TinyImageNet-200 datasets show that the proposed method is much more accurate and scalable than state-of-the-art deep clustering methods, and further learns a latent linear representation of the data.</description><subject>Computer Science - Computer Vision and Pattern Recognition</subject><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotzs2KwjAUBeBsZiGOD-DKvkDrvWlv0y6HMv5ABze6LmlzI4GakXQU9en9GzhwOJvDJ8QUIckKIpjrcHHnRKaACWABNBKw88PpyOHsBjbRj_bO_vYmqp1nHdzN-X2kvYmq_jT8cXjMT_FhdT_w5L_HYrv43laruN4s19VXHetcUawsWiqQUgWSyOZgUBkmoztZ5jkDMnOXkWJuyxawk4pl2WVt9ghCwelYzN63L3JzDO6gw7V50psXPb0DZK4-Jw</recordid><startdate>20230104</startdate><enddate>20230104</enddate><creator>Ding, Tianjiao</creator><creator>Tong, Shengbang</creator><creator>Chan, Kwan Ho Ryan</creator><creator>Dai, Xili</creator><creator>Ma, Yi</creator><creator>Haeffele, Benjamin D</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20230104</creationdate><title>Unsupervised Manifold Linearizing and Clustering</title><author>Ding, Tianjiao ; Tong, Shengbang ; Chan, Kwan Ho Ryan ; Dai, Xili ; Ma, Yi ; Haeffele, Benjamin D</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a675-7f1f5815370255f60d17de5dac2966e01eeec457eeb9b01c27e29c4b44b4108e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Computer Vision and Pattern Recognition</topic><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Ding, Tianjiao</creatorcontrib><creatorcontrib>Tong, Shengbang</creatorcontrib><creatorcontrib>Chan, Kwan Ho Ryan</creatorcontrib><creatorcontrib>Dai, Xili</creatorcontrib><creatorcontrib>Ma, Yi</creatorcontrib><creatorcontrib>Haeffele, Benjamin D</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Ding, Tianjiao</au><au>Tong, Shengbang</au><au>Chan, Kwan Ho Ryan</au><au>Dai, Xili</au><au>Ma, Yi</au><au>Haeffele, Benjamin D</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Unsupervised Manifold Linearizing and Clustering</atitle><date>2023-01-04</date><risdate>2023</risdate><abstract>We consider the problem of simultaneously clustering and learning a linear representation of data lying close to a union of low-dimensional manifolds, a fundamental task in machine learning and computer vision. When the manifolds are assumed to be linear subspaces, this reduces to the classical problem of subspace clustering, which has been studied extensively over the past two decades. Unfortunately, many real-world datasets such as natural images can not be well approximated by linear subspaces. On the other hand, numerous works have attempted to learn an appropriate transformation of the data, such that data is mapped from a union of general non-linear manifolds to a union of linear subspaces (with points from the same manifold being mapped to the same subspace). However, many existing works have limitations such as assuming knowledge of the membership of samples to clusters, requiring high sampling density, or being shown theoretically to learn trivial representations. In this paper, we propose to optimize the Maximal Coding Rate Reduction metric with respect to both the data representation and a novel doubly stochastic cluster membership, inspired by state-of-the-art subspace clustering results. We give a parameterization of such a representation and membership, allowing efficient mini-batching and one-shot initialization. Experiments on CIFAR-10, -20, -100, and TinyImageNet-200 datasets show that the proposed method is much more accurate and scalable than state-of-the-art deep clustering methods, and further learns a latent linear representation of the data.</abstract><doi>10.48550/arxiv.2301.01805</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2301.01805
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2301_01805
source	arXiv.org
subjects	Computer Science - Computer Vision and Pattern Recognition Computer Science - Learning
title	Unsupervised Manifold Linearizing and Clustering
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-24T12%3A13%3A13IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Unsupervised%20Manifold%20Linearizing%20and%20Clustering&rft.au=Ding,%20Tianjiao&rft.date=2023-01-04&rft_id=info:doi/10.48550/arxiv.2301.01805&rft_dat=%3Carxiv_GOX%3E2301_01805%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true