Graph Embedding Contrastive Multi-Modal Representation Learning for Clustering

Multi-modal clustering (MMC) aims to explore complementary information from diverse modalities for clustering performance facilitating. This article studies challenging problems in MMC methods based on deep neural networks. On one hand, most existing methods lack a unified objective to simultaneousl...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on image processing 2023-01, Vol.PP, p.1-1
Hauptverfasser:	Xia, Wei, Wang, Tianxiu, Gao, Quanxue, Yang, Ming, Gao, Xinbo
Format:	Artikel
Sprache:	eng
Schlagworte:	Artificial neural networks Centroids Clustering Consistency Datasets Embedding Graphical representations Labels Machine learning Multi-modal learning representation learning self-supervision Similarity
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	1
container_issue
container_start_page	1
container_title	IEEE transactions on image processing
container_volume	PP
creator	Xia, Wei Wang, Tianxiu Gao, Quanxue Yang, Ming Gao, Xinbo
description	Multi-modal clustering (MMC) aims to explore complementary information from diverse modalities for clustering performance facilitating. This article studies challenging problems in MMC methods based on deep neural networks. On one hand, most existing methods lack a unified objective to simultaneously learn the inter- and intra-modality consistency, resulting in a limited representation learning capacity. On the other hand, most existing processes are modeled for a finite sample set and cannot handle out-of-sample data. To handle the above two challenges, we propose a novel Graph Embedding Contrastive Multi-modal Clustering network (GECMC), which treats the representation learning and multi-modal clustering as two sides of one coin rather than two separate problems. In brief, we specifically design a contrastive loss by benefiting from pseudo-labels to explore consistency across modalities. Thus, GECMC shows an effective way to maximize the similarities of intra-cluster representations while minimizing the similarities of inter-cluster representations at both inter- and intra-modality levels. So, the clustering and representation learning interact and jointly evolve in a co-training framework. After that, we build a clustering layer parameterized with cluster centroids, showing that GECMC can learn the clustering labels with given samples and handle out-of-sample data. GECMC yields superior results than 14 competitive methods on four challenging datasets. Codes and datasets are available: https://github.com/xdweixia/GECMC.
doi_str_mv	10.1109/TIP.2023.3240863
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_miscellaneous_2797148768</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10036442</ieee_id><sourcerecordid>2776795838</sourcerecordid><originalsourceid>FETCH-LOGICAL-c348t-caac8e6b814ecf23bff2844d70f4c3f3b55e52a8e6c515fe1d0f37c57c203dcc3</originalsourceid><addsrcrecordid>eNpdkE1Lw0AQhhdRrFbvHkQCXryk7mc2OUqotdCqSD2HzWZWU_LlbiL4793QKuJpZuCZl5kHoQuCZ4Tg5HazfJ5RTNmMUY7jiB2gE5JwEmLM6aHvsZChJDyZoFPnthgTLkh0jCZMYko5IyfocWFV9x7M6xyKomzegrRteqtcX35CsB6qvgzXbaGq4AU6Cw6aXvVl2wQrULYZedPaIK0G14P14xk6MqpycL6vU_R6P9-kD-HqabFM71ahZjzuQ62UjiHKY8JBG8pyY2jMeSGx4ZoZlgsBgiqPaEGEAVJgw6QWUlPMCq3ZFN3scjvbfgzg-qwunYaqUg20g8uoTPzfsYxij17_Q7ftYBt_nadkJBMRs5HCO0rb1jkLJutsWSv7lRGcja4z7zobXWd7137lah885DUUvws_cj1wuQNKAPiTh1nEOWXfNKeC7Q</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2776795838</pqid></control><display><type>article</type><title>Graph Embedding Contrastive Multi-Modal Representation Learning for Clustering</title><source>IEEE Electronic Library (IEL)</source><creator>Xia, Wei ; Wang, Tianxiu ; Gao, Quanxue ; Yang, Ming ; Gao, Xinbo</creator><creatorcontrib>Xia, Wei ; Wang, Tianxiu ; Gao, Quanxue ; Yang, Ming ; Gao, Xinbo</creatorcontrib><description>Multi-modal clustering (MMC) aims to explore complementary information from diverse modalities for clustering performance facilitating. This article studies challenging problems in MMC methods based on deep neural networks. On one hand, most existing methods lack a unified objective to simultaneously learn the inter- and intra-modality consistency, resulting in a limited representation learning capacity. On the other hand, most existing processes are modeled for a finite sample set and cannot handle out-of-sample data. To handle the above two challenges, we propose a novel Graph Embedding Contrastive Multi-modal Clustering network (GECMC), which treats the representation learning and multi-modal clustering as two sides of one coin rather than two separate problems. In brief, we specifically design a contrastive loss by benefiting from pseudo-labels to explore consistency across modalities. Thus, GECMC shows an effective way to maximize the similarities of intra-cluster representations while minimizing the similarities of inter-cluster representations at both inter- and intra-modality levels. So, the clustering and representation learning interact and jointly evolve in a co-training framework. After that, we build a clustering layer parameterized with cluster centroids, showing that GECMC can learn the clustering labels with given samples and handle out-of-sample data. GECMC yields superior results than 14 competitive methods on four challenging datasets. Codes and datasets are available: https://github.com/xdweixia/GECMC.</description><identifier>ISSN: 1057-7149</identifier><identifier>EISSN: 1941-0042</identifier><identifier>DOI: 10.1109/TIP.2023.3240863</identifier><identifier>PMID: 37022431</identifier><identifier>CODEN: IIPRE4</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>Artificial neural networks ; Centroids ; Clustering ; Consistency ; Datasets ; Embedding ; Graphical representations ; Labels ; Machine learning ; Multi-modal learning ; representation learning ; self-supervision ; Similarity</subject><ispartof>IEEE transactions on image processing, 2023-01, Vol.PP, p.1-1</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c348t-caac8e6b814ecf23bff2844d70f4c3f3b55e52a8e6c515fe1d0f37c57c203dcc3</citedby><cites>FETCH-LOGICAL-c348t-caac8e6b814ecf23bff2844d70f4c3f3b55e52a8e6c515fe1d0f37c57c203dcc3</cites><orcidid>0000-0001-8988-8381 ; 0000-0003-1810-1566 ; 0000-0001-5082-2940</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10036442$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10036442$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/37022431$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Xia, Wei</creatorcontrib><creatorcontrib>Wang, Tianxiu</creatorcontrib><creatorcontrib>Gao, Quanxue</creatorcontrib><creatorcontrib>Yang, Ming</creatorcontrib><creatorcontrib>Gao, Xinbo</creatorcontrib><title>Graph Embedding Contrastive Multi-Modal Representation Learning for Clustering</title><title>IEEE transactions on image processing</title><addtitle>TIP</addtitle><addtitle>IEEE Trans Image Process</addtitle><description>Multi-modal clustering (MMC) aims to explore complementary information from diverse modalities for clustering performance facilitating. This article studies challenging problems in MMC methods based on deep neural networks. On one hand, most existing methods lack a unified objective to simultaneously learn the inter- and intra-modality consistency, resulting in a limited representation learning capacity. On the other hand, most existing processes are modeled for a finite sample set and cannot handle out-of-sample data. To handle the above two challenges, we propose a novel Graph Embedding Contrastive Multi-modal Clustering network (GECMC), which treats the representation learning and multi-modal clustering as two sides of one coin rather than two separate problems. In brief, we specifically design a contrastive loss by benefiting from pseudo-labels to explore consistency across modalities. Thus, GECMC shows an effective way to maximize the similarities of intra-cluster representations while minimizing the similarities of inter-cluster representations at both inter- and intra-modality levels. So, the clustering and representation learning interact and jointly evolve in a co-training framework. After that, we build a clustering layer parameterized with cluster centroids, showing that GECMC can learn the clustering labels with given samples and handle out-of-sample data. GECMC yields superior results than 14 competitive methods on four challenging datasets. Codes and datasets are available: https://github.com/xdweixia/GECMC.</description><subject>Artificial neural networks</subject><subject>Centroids</subject><subject>Clustering</subject><subject>Consistency</subject><subject>Datasets</subject><subject>Embedding</subject><subject>Graphical representations</subject><subject>Labels</subject><subject>Machine learning</subject><subject>Multi-modal learning</subject><subject>representation learning</subject><subject>self-supervision</subject><subject>Similarity</subject><issn>1057-7149</issn><issn>1941-0042</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpdkE1Lw0AQhhdRrFbvHkQCXryk7mc2OUqotdCqSD2HzWZWU_LlbiL4793QKuJpZuCZl5kHoQuCZ4Tg5HazfJ5RTNmMUY7jiB2gE5JwEmLM6aHvsZChJDyZoFPnthgTLkh0jCZMYko5IyfocWFV9x7M6xyKomzegrRteqtcX35CsB6qvgzXbaGq4AU6Cw6aXvVl2wQrULYZedPaIK0G14P14xk6MqpycL6vU_R6P9-kD-HqabFM71ahZjzuQ62UjiHKY8JBG8pyY2jMeSGx4ZoZlgsBgiqPaEGEAVJgw6QWUlPMCq3ZFN3scjvbfgzg-qwunYaqUg20g8uoTPzfsYxij17_Q7ftYBt_nadkJBMRs5HCO0rb1jkLJutsWSv7lRGcja4z7zobXWd7137lah885DUUvws_cj1wuQNKAPiTh1nEOWXfNKeC7Q</recordid><startdate>20230101</startdate><enddate>20230101</enddate><creator>Xia, Wei</creator><creator>Wang, Tianxiu</creator><creator>Gao, Quanxue</creator><creator>Yang, Ming</creator><creator>Gao, Xinbo</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0001-8988-8381</orcidid><orcidid>https://orcid.org/0000-0003-1810-1566</orcidid><orcidid>https://orcid.org/0000-0001-5082-2940</orcidid></search><sort><creationdate>20230101</creationdate><title>Graph Embedding Contrastive Multi-Modal Representation Learning for Clustering</title><author>Xia, Wei ; Wang, Tianxiu ; Gao, Quanxue ; Yang, Ming ; Gao, Xinbo</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c348t-caac8e6b814ecf23bff2844d70f4c3f3b55e52a8e6c515fe1d0f37c57c203dcc3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Artificial neural networks</topic><topic>Centroids</topic><topic>Clustering</topic><topic>Consistency</topic><topic>Datasets</topic><topic>Embedding</topic><topic>Graphical representations</topic><topic>Labels</topic><topic>Machine learning</topic><topic>Multi-modal learning</topic><topic>representation learning</topic><topic>self-supervision</topic><topic>Similarity</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Xia, Wei</creatorcontrib><creatorcontrib>Wang, Tianxiu</creatorcontrib><creatorcontrib>Gao, Quanxue</creatorcontrib><creatorcontrib>Yang, Ming</creatorcontrib><creatorcontrib>Gao, Xinbo</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE transactions on image processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Xia, Wei</au><au>Wang, Tianxiu</au><au>Gao, Quanxue</au><au>Yang, Ming</au><au>Gao, Xinbo</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Graph Embedding Contrastive Multi-Modal Representation Learning for Clustering</atitle><jtitle>IEEE transactions on image processing</jtitle><stitle>TIP</stitle><addtitle>IEEE Trans Image Process</addtitle><date>2023-01-01</date><risdate>2023</risdate><volume>PP</volume><spage>1</spage><epage>1</epage><pages>1-1</pages><issn>1057-7149</issn><eissn>1941-0042</eissn><coden>IIPRE4</coden><abstract>Multi-modal clustering (MMC) aims to explore complementary information from diverse modalities for clustering performance facilitating. This article studies challenging problems in MMC methods based on deep neural networks. On one hand, most existing methods lack a unified objective to simultaneously learn the inter- and intra-modality consistency, resulting in a limited representation learning capacity. On the other hand, most existing processes are modeled for a finite sample set and cannot handle out-of-sample data. To handle the above two challenges, we propose a novel Graph Embedding Contrastive Multi-modal Clustering network (GECMC), which treats the representation learning and multi-modal clustering as two sides of one coin rather than two separate problems. In brief, we specifically design a contrastive loss by benefiting from pseudo-labels to explore consistency across modalities. Thus, GECMC shows an effective way to maximize the similarities of intra-cluster representations while minimizing the similarities of inter-cluster representations at both inter- and intra-modality levels. So, the clustering and representation learning interact and jointly evolve in a co-training framework. After that, we build a clustering layer parameterized with cluster centroids, showing that GECMC can learn the clustering labels with given samples and handle out-of-sample data. GECMC yields superior results than 14 competitive methods on four challenging datasets. Codes and datasets are available: https://github.com/xdweixia/GECMC.</abstract><cop>United States</cop><pub>IEEE</pub><pmid>37022431</pmid><doi>10.1109/TIP.2023.3240863</doi><tpages>1</tpages><orcidid>https://orcid.org/0000-0001-8988-8381</orcidid><orcidid>https://orcid.org/0000-0003-1810-1566</orcidid><orcidid>https://orcid.org/0000-0001-5082-2940</orcidid></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 1057-7149
ispartof	IEEE transactions on image processing, 2023-01, Vol.PP, p.1-1
issn	1057-7149 1941-0042
language	eng
recordid	cdi_proquest_miscellaneous_2797148768
source	IEEE Electronic Library (IEL)
subjects	Artificial neural networks Centroids Clustering Consistency Datasets Embedding Graphical representations Labels Machine learning Multi-modal learning representation learning self-supervision Similarity
title	Graph Embedding Contrastive Multi-Modal Representation Learning for Clustering
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-27T10%3A21%3A27IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Graph%20Embedding%20Contrastive%20Multi-Modal%20Representation%20Learning%20for%20Clustering&rft.jtitle=IEEE%20transactions%20on%20image%20processing&rft.au=Xia,%20Wei&rft.date=2023-01-01&rft.volume=PP&rft.spage=1&rft.epage=1&rft.pages=1-1&rft.issn=1057-7149&rft.eissn=1941-0042&rft.coden=IIPRE4&rft_id=info:doi/10.1109/TIP.2023.3240863&rft_dat=%3Cproquest_RIE%3E2776795838%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2776795838&rft_id=info:pmid/37022431&rft_ieee_id=10036442&rfr_iscdi=true