Graph Embedding Contrastive Multi-Modal Representation Learning for Clustering

Multi-modal clustering (MMC) aims to explore complementary information from diverse modalities for clustering performance facilitating. This article studies challenging problems in MMC methods based on deep neural networks. On one hand, most existing methods lack a unified objective to simultaneousl...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on image processing 2023-01, Vol.PP, p.1-1
Hauptverfasser: Xia, Wei, Wang, Tianxiu, Gao, Quanxue, Yang, Ming, Gao, Xinbo
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1
container_issue
container_start_page 1
container_title IEEE transactions on image processing
container_volume PP
creator Xia, Wei
Wang, Tianxiu
Gao, Quanxue
Yang, Ming
Gao, Xinbo
description Multi-modal clustering (MMC) aims to explore complementary information from diverse modalities for clustering performance facilitating. This article studies challenging problems in MMC methods based on deep neural networks. On one hand, most existing methods lack a unified objective to simultaneously learn the inter- and intra-modality consistency, resulting in a limited representation learning capacity. On the other hand, most existing processes are modeled for a finite sample set and cannot handle out-of-sample data. To handle the above two challenges, we propose a novel Graph Embedding Contrastive Multi-modal Clustering network (GECMC), which treats the representation learning and multi-modal clustering as two sides of one coin rather than two separate problems. In brief, we specifically design a contrastive loss by benefiting from pseudo-labels to explore consistency across modalities. Thus, GECMC shows an effective way to maximize the similarities of intra-cluster representations while minimizing the similarities of inter-cluster representations at both inter- and intra-modality levels. So, the clustering and representation learning interact and jointly evolve in a co-training framework. After that, we build a clustering layer parameterized with cluster centroids, showing that GECMC can learn the clustering labels with given samples and handle out-of-sample data. GECMC yields superior results than 14 competitive methods on four challenging datasets. Codes and datasets are available: https://github.com/xdweixia/GECMC.
doi_str_mv 10.1109/TIP.2023.3240863
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_miscellaneous_2797148768</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10036442</ieee_id><sourcerecordid>2776795838</sourcerecordid><originalsourceid>FETCH-LOGICAL-c348t-caac8e6b814ecf23bff2844d70f4c3f3b55e52a8e6c515fe1d0f37c57c203dcc3</originalsourceid><addsrcrecordid>eNpdkE1Lw0AQhhdRrFbvHkQCXryk7mc2OUqotdCqSD2HzWZWU_LlbiL4793QKuJpZuCZl5kHoQuCZ4Tg5HazfJ5RTNmMUY7jiB2gE5JwEmLM6aHvsZChJDyZoFPnthgTLkh0jCZMYko5IyfocWFV9x7M6xyKomzegrRteqtcX35CsB6qvgzXbaGq4AU6Cw6aXvVl2wQrULYZedPaIK0G14P14xk6MqpycL6vU_R6P9-kD-HqabFM71ahZjzuQ62UjiHKY8JBG8pyY2jMeSGx4ZoZlgsBgiqPaEGEAVJgw6QWUlPMCq3ZFN3scjvbfgzg-qwunYaqUg20g8uoTPzfsYxij17_Q7ftYBt_nadkJBMRs5HCO0rb1jkLJutsWSv7lRGcja4z7zobXWd7137lah885DUUvws_cj1wuQNKAPiTh1nEOWXfNKeC7Q</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2776795838</pqid></control><display><type>article</type><title>Graph Embedding Contrastive Multi-Modal Representation Learning for Clustering</title><source>IEEE Electronic Library (IEL)</source><creator>Xia, Wei ; Wang, Tianxiu ; Gao, Quanxue ; Yang, Ming ; Gao, Xinbo</creator><creatorcontrib>Xia, Wei ; Wang, Tianxiu ; Gao, Quanxue ; Yang, Ming ; Gao, Xinbo</creatorcontrib><description>Multi-modal clustering (MMC) aims to explore complementary information from diverse modalities for clustering performance facilitating. This article studies challenging problems in MMC methods based on deep neural networks. On one hand, most existing methods lack a unified objective to simultaneously learn the inter- and intra-modality consistency, resulting in a limited representation learning capacity. On the other hand, most existing processes are modeled for a finite sample set and cannot handle out-of-sample data. To handle the above two challenges, we propose a novel Graph Embedding Contrastive Multi-modal Clustering network (GECMC), which treats the representation learning and multi-modal clustering as two sides of one coin rather than two separate problems. In brief, we specifically design a contrastive loss by benefiting from pseudo-labels to explore consistency across modalities. Thus, GECMC shows an effective way to maximize the similarities of intra-cluster representations while minimizing the similarities of inter-cluster representations at both inter- and intra-modality levels. So, the clustering and representation learning interact and jointly evolve in a co-training framework. After that, we build a clustering layer parameterized with cluster centroids, showing that GECMC can learn the clustering labels with given samples and handle out-of-sample data. GECMC yields superior results than 14 competitive methods on four challenging datasets. Codes and datasets are available: https://github.com/xdweixia/GECMC.</description><identifier>ISSN: 1057-7149</identifier><identifier>EISSN: 1941-0042</identifier><identifier>DOI: 10.1109/TIP.2023.3240863</identifier><identifier>PMID: 37022431</identifier><identifier>CODEN: IIPRE4</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>Artificial neural networks ; Centroids ; Clustering ; Consistency ; Datasets ; Embedding ; Graphical representations ; Labels ; Machine learning ; Multi-modal learning ; representation learning ; self-supervision ; Similarity</subject><ispartof>IEEE transactions on image processing, 2023-01, Vol.PP, p.1-1</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c348t-caac8e6b814ecf23bff2844d70f4c3f3b55e52a8e6c515fe1d0f37c57c203dcc3</citedby><cites>FETCH-LOGICAL-c348t-caac8e6b814ecf23bff2844d70f4c3f3b55e52a8e6c515fe1d0f37c57c203dcc3</cites><orcidid>0000-0001-8988-8381 ; 0000-0003-1810-1566 ; 0000-0001-5082-2940</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10036442$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10036442$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/37022431$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Xia, Wei</creatorcontrib><creatorcontrib>Wang, Tianxiu</creatorcontrib><creatorcontrib>Gao, Quanxue</creatorcontrib><creatorcontrib>Yang, Ming</creatorcontrib><creatorcontrib>Gao, Xinbo</creatorcontrib><title>Graph Embedding Contrastive Multi-Modal Representation Learning for Clustering</title><title>IEEE transactions on image processing</title><addtitle>TIP</addtitle><addtitle>IEEE Trans Image Process</addtitle><description>Multi-modal clustering (MMC) aims to explore complementary information from diverse modalities for clustering performance facilitating. This article studies challenging problems in MMC methods based on deep neural networks. On one hand, most existing methods lack a unified objective to simultaneously learn the inter- and intra-modality consistency, resulting in a limited representation learning capacity. On the other hand, most existing processes are modeled for a finite sample set and cannot handle out-of-sample data. To handle the above two challenges, we propose a novel Graph Embedding Contrastive Multi-modal Clustering network (GECMC), which treats the representation learning and multi-modal clustering as two sides of one coin rather than two separate problems. In brief, we specifically design a contrastive loss by benefiting from pseudo-labels to explore consistency across modalities. Thus, GECMC shows an effective way to maximize the similarities of intra-cluster representations while minimizing the similarities of inter-cluster representations at both inter- and intra-modality levels. So, the clustering and representation learning interact and jointly evolve in a co-training framework. After that, we build a clustering layer parameterized with cluster centroids, showing that GECMC can learn the clustering labels with given samples and handle out-of-sample data. GECMC yields superior results than 14 competitive methods on four challenging datasets. Codes and datasets are available: https://github.com/xdweixia/GECMC.</description><subject>Artificial neural networks</subject><subject>Centroids</subject><subject>Clustering</subject><subject>Consistency</subject><subject>Datasets</subject><subject>Embedding</subject><subject>Graphical representations</subject><subject>Labels</subject><subject>Machine learning</subject><subject>Multi-modal learning</subject><subject>representation learning</subject><subject>self-supervision</subject><subject>Similarity</subject><issn>1057-7149</issn><issn>1941-0042</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpdkE1Lw0AQhhdRrFbvHkQCXryk7mc2OUqotdCqSD2HzWZWU_LlbiL4793QKuJpZuCZl5kHoQuCZ4Tg5HazfJ5RTNmMUY7jiB2gE5JwEmLM6aHvsZChJDyZoFPnthgTLkh0jCZMYko5IyfocWFV9x7M6xyKomzegrRteqtcX35CsB6qvgzXbaGq4AU6Cw6aXvVl2wQrULYZedPaIK0G14P14xk6MqpycL6vU_R6P9-kD-HqabFM71ahZjzuQ62UjiHKY8JBG8pyY2jMeSGx4ZoZlgsBgiqPaEGEAVJgw6QWUlPMCq3ZFN3scjvbfgzg-qwunYaqUg20g8uoTPzfsYxij17_Q7ftYBt_nadkJBMRs5HCO0rb1jkLJutsWSv7lRGcja4z7zobXWd7137lah885DUUvws_cj1wuQNKAPiTh1nEOWXfNKeC7Q</recordid><startdate>20230101</startdate><enddate>20230101</enddate><creator>Xia, Wei</creator><creator>Wang, Tianxiu</creator><creator>Gao, Quanxue</creator><creator>Yang, Ming</creator><creator>Gao, Xinbo</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0001-8988-8381</orcidid><orcidid>https://orcid.org/0000-0003-1810-1566</orcidid><orcidid>https://orcid.org/0000-0001-5082-2940</orcidid></search><sort><creationdate>20230101</creationdate><title>Graph Embedding Contrastive Multi-Modal Representation Learning for Clustering</title><author>Xia, Wei ; Wang, Tianxiu ; Gao, Quanxue ; Yang, Ming ; Gao, Xinbo</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c348t-caac8e6b814ecf23bff2844d70f4c3f3b55e52a8e6c515fe1d0f37c57c203dcc3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Artificial neural networks</topic><topic>Centroids</topic><topic>Clustering</topic><topic>Consistency</topic><topic>Datasets</topic><topic>Embedding</topic><topic>Graphical representations</topic><topic>Labels</topic><topic>Machine learning</topic><topic>Multi-modal learning</topic><topic>representation learning</topic><topic>self-supervision</topic><topic>Similarity</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Xia, Wei</creatorcontrib><creatorcontrib>Wang, Tianxiu</creatorcontrib><creatorcontrib>Gao, Quanxue</creatorcontrib><creatorcontrib>Yang, Ming</creatorcontrib><creatorcontrib>Gao, Xinbo</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE transactions on image processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Xia, Wei</au><au>Wang, Tianxiu</au><au>Gao, Quanxue</au><au>Yang, Ming</au><au>Gao, Xinbo</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Graph Embedding Contrastive Multi-Modal Representation Learning for Clustering</atitle><jtitle>IEEE transactions on image processing</jtitle><stitle>TIP</stitle><addtitle>IEEE Trans Image Process</addtitle><date>2023-01-01</date><risdate>2023</risdate><volume>PP</volume><spage>1</spage><epage>1</epage><pages>1-1</pages><issn>1057-7149</issn><eissn>1941-0042</eissn><coden>IIPRE4</coden><abstract>Multi-modal clustering (MMC) aims to explore complementary information from diverse modalities for clustering performance facilitating. This article studies challenging problems in MMC methods based on deep neural networks. On one hand, most existing methods lack a unified objective to simultaneously learn the inter- and intra-modality consistency, resulting in a limited representation learning capacity. On the other hand, most existing processes are modeled for a finite sample set and cannot handle out-of-sample data. To handle the above two challenges, we propose a novel Graph Embedding Contrastive Multi-modal Clustering network (GECMC), which treats the representation learning and multi-modal clustering as two sides of one coin rather than two separate problems. In brief, we specifically design a contrastive loss by benefiting from pseudo-labels to explore consistency across modalities. Thus, GECMC shows an effective way to maximize the similarities of intra-cluster representations while minimizing the similarities of inter-cluster representations at both inter- and intra-modality levels. So, the clustering and representation learning interact and jointly evolve in a co-training framework. After that, we build a clustering layer parameterized with cluster centroids, showing that GECMC can learn the clustering labels with given samples and handle out-of-sample data. GECMC yields superior results than 14 competitive methods on four challenging datasets. Codes and datasets are available: https://github.com/xdweixia/GECMC.</abstract><cop>United States</cop><pub>IEEE</pub><pmid>37022431</pmid><doi>10.1109/TIP.2023.3240863</doi><tpages>1</tpages><orcidid>https://orcid.org/0000-0001-8988-8381</orcidid><orcidid>https://orcid.org/0000-0003-1810-1566</orcidid><orcidid>https://orcid.org/0000-0001-5082-2940</orcidid></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1057-7149
ispartof IEEE transactions on image processing, 2023-01, Vol.PP, p.1-1
issn 1057-7149
1941-0042
language eng
recordid cdi_proquest_miscellaneous_2797148768
source IEEE Electronic Library (IEL)
subjects Artificial neural networks
Centroids
Clustering
Consistency
Datasets
Embedding
Graphical representations
Labels
Machine learning
Multi-modal learning
representation learning
self-supervision
Similarity
title Graph Embedding Contrastive Multi-Modal Representation Learning for Clustering
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-27T10%3A21%3A27IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Graph%20Embedding%20Contrastive%20Multi-Modal%20Representation%20Learning%20for%20Clustering&rft.jtitle=IEEE%20transactions%20on%20image%20processing&rft.au=Xia,%20Wei&rft.date=2023-01-01&rft.volume=PP&rft.spage=1&rft.epage=1&rft.pages=1-1&rft.issn=1057-7149&rft.eissn=1941-0042&rft.coden=IIPRE4&rft_id=info:doi/10.1109/TIP.2023.3240863&rft_dat=%3Cproquest_RIE%3E2776795838%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2776795838&rft_id=info:pmid/37022431&rft_ieee_id=10036442&rfr_iscdi=true