Deep contrastive multi-view clustering with doubly enhanced commonality

Recently, deep multi-view clustering leveraging autoencoders has garnered significant attention due to its ability to simultaneously enhance feature learning capabilities and optimize clustering outcomes. However, existing autoencoder-based deep multi-view clustering methods often exhibit a tendency...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Multimedia systems 2024-08, Vol.30 (4), Article 196
Hauptverfasser:	Yang, Zhiyuan, Zhu, Changming, Li, Zishi
Format:	Artikel
Sprache:	eng
Schlagworte:	Clustering Commonality Computer Communication Networks Computer Graphics Computer Science Cryptology Data Storage Representation Dilution Multimedia Information Systems Operating Systems Regular Paper
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue	4
container_start_page
container_title	Multimedia systems
container_volume	30
creator	Yang, Zhiyuan Zhu, Changming Li, Zishi
description	Recently, deep multi-view clustering leveraging autoencoders has garnered significant attention due to its ability to simultaneously enhance feature learning capabilities and optimize clustering outcomes. However, existing autoencoder-based deep multi-view clustering methods often exhibit a tendency to either overly emphasize view-specific information, thus neglecting shared information across views, or alternatively, to place undue focus on shared information, resulting in the dilution of complementary information from individual views. Given the principle that commonality resides within individuality, this paper proposes a staged training approach that comprises two phases: pre-training and fine-tuning. The pre-training phase primarily focuses on learning view-specific information, while the fine-tuning phase aims to doubly enhance commonality across views while maintaining these specific details. Specifically, we learn and extract the specific information of each view through the autoencoder in the pre-training stage. After entering the fine-tuning stage, we first initially enhance the commonality between independent specific views through the transformer layer, and then further strengthen these commonalities through contrastive learning on the semantic labels of each view, so as to obtain more accurate clustering results.
doi_str_mv	10.1007/s00530-024-01400-1
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_3075962372</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3075962372</sourcerecordid><originalsourceid>FETCH-LOGICAL-c200t-190ad58323eba81837202bf2c95f24c286ceae2882a8ebc27d936e985ebafde73</originalsourceid><addsrcrecordid>eNp9kMtOwzAQRS0EEqXwA6wisTaM7TycJSpQkCqxgbXlOBPqKi9sp1X_HkOQ2LGazT1Xdw4h1wxuGUBx5wEyARR4SoGlAJSdkAVLBadMSn5KFlCmnKZlzs_Jhfc7AFbkAhZk_YA4Jmbog9M-2D0m3dQGS_cWD4lpJx_Q2f4jOdiwTephqtpjgv1W9wbriHXd0OvWhuMlOWt06_Hq9y7J-9Pj2-qZbl7XL6v7DTUcIFBWgq4zKbjASksmRcGBVw03Zdbw1HCZG9TI42YtsTK8qEuRYymzGG9qLMSS3My9oxs-J_RB7YbJxQ1eCSiy-GCsjCk-p4wbvHfYqNHZTrujYqC-halZmIrC1I8wxSIkZsiP3y-j-6v-h_oCVVtujw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3075962372</pqid></control><display><type>article</type><title>Deep contrastive multi-view clustering with doubly enhanced commonality</title><source>SpringerLink Journals - AutoHoldings</source><creator>Yang, Zhiyuan ; Zhu, Changming ; Li, Zishi</creator><creatorcontrib>Yang, Zhiyuan ; Zhu, Changming ; Li, Zishi</creatorcontrib><description>Recently, deep multi-view clustering leveraging autoencoders has garnered significant attention due to its ability to simultaneously enhance feature learning capabilities and optimize clustering outcomes. However, existing autoencoder-based deep multi-view clustering methods often exhibit a tendency to either overly emphasize view-specific information, thus neglecting shared information across views, or alternatively, to place undue focus on shared information, resulting in the dilution of complementary information from individual views. Given the principle that commonality resides within individuality, this paper proposes a staged training approach that comprises two phases: pre-training and fine-tuning. The pre-training phase primarily focuses on learning view-specific information, while the fine-tuning phase aims to doubly enhance commonality across views while maintaining these specific details. Specifically, we learn and extract the specific information of each view through the autoencoder in the pre-training stage. After entering the fine-tuning stage, we first initially enhance the commonality between independent specific views through the transformer layer, and then further strengthen these commonalities through contrastive learning on the semantic labels of each view, so as to obtain more accurate clustering results.</description><identifier>ISSN: 0942-4962</identifier><identifier>EISSN: 1432-1882</identifier><identifier>DOI: 10.1007/s00530-024-01400-1</identifier><language>eng</language><publisher>Berlin/Heidelberg: Springer Berlin Heidelberg</publisher><subject>Clustering ; Commonality ; Computer Communication Networks ; Computer Graphics ; Computer Science ; Cryptology ; Data Storage Representation ; Dilution ; Multimedia Information Systems ; Operating Systems ; Regular Paper</subject><ispartof>Multimedia systems, 2024-08, Vol.30 (4), Article 196</ispartof><rights>The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2024. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c200t-190ad58323eba81837202bf2c95f24c286ceae2882a8ebc27d936e985ebafde73</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s00530-024-01400-1$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s00530-024-01400-1$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,776,780,27901,27902,41464,42533,51294</link.rule.ids></links><search><creatorcontrib>Yang, Zhiyuan</creatorcontrib><creatorcontrib>Zhu, Changming</creatorcontrib><creatorcontrib>Li, Zishi</creatorcontrib><title>Deep contrastive multi-view clustering with doubly enhanced commonality</title><title>Multimedia systems</title><addtitle>Multimedia Systems</addtitle><description>Recently, deep multi-view clustering leveraging autoencoders has garnered significant attention due to its ability to simultaneously enhance feature learning capabilities and optimize clustering outcomes. However, existing autoencoder-based deep multi-view clustering methods often exhibit a tendency to either overly emphasize view-specific information, thus neglecting shared information across views, or alternatively, to place undue focus on shared information, resulting in the dilution of complementary information from individual views. Given the principle that commonality resides within individuality, this paper proposes a staged training approach that comprises two phases: pre-training and fine-tuning. The pre-training phase primarily focuses on learning view-specific information, while the fine-tuning phase aims to doubly enhance commonality across views while maintaining these specific details. Specifically, we learn and extract the specific information of each view through the autoencoder in the pre-training stage. After entering the fine-tuning stage, we first initially enhance the commonality between independent specific views through the transformer layer, and then further strengthen these commonalities through contrastive learning on the semantic labels of each view, so as to obtain more accurate clustering results.</description><subject>Clustering</subject><subject>Commonality</subject><subject>Computer Communication Networks</subject><subject>Computer Graphics</subject><subject>Computer Science</subject><subject>Cryptology</subject><subject>Data Storage Representation</subject><subject>Dilution</subject><subject>Multimedia Information Systems</subject><subject>Operating Systems</subject><subject>Regular Paper</subject><issn>0942-4962</issn><issn>1432-1882</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNp9kMtOwzAQRS0EEqXwA6wisTaM7TycJSpQkCqxgbXlOBPqKi9sp1X_HkOQ2LGazT1Xdw4h1wxuGUBx5wEyARR4SoGlAJSdkAVLBadMSn5KFlCmnKZlzs_Jhfc7AFbkAhZk_YA4Jmbog9M-2D0m3dQGS_cWD4lpJx_Q2f4jOdiwTephqtpjgv1W9wbriHXd0OvWhuMlOWt06_Hq9y7J-9Pj2-qZbl7XL6v7DTUcIFBWgq4zKbjASksmRcGBVw03Zdbw1HCZG9TI42YtsTK8qEuRYymzGG9qLMSS3My9oxs-J_RB7YbJxQ1eCSiy-GCsjCk-p4wbvHfYqNHZTrujYqC-halZmIrC1I8wxSIkZsiP3y-j-6v-h_oCVVtujw</recordid><startdate>20240801</startdate><enddate>20240801</enddate><creator>Yang, Zhiyuan</creator><creator>Zhu, Changming</creator><creator>Li, Zishi</creator><general>Springer Berlin Heidelberg</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20240801</creationdate><title>Deep contrastive multi-view clustering with doubly enhanced commonality</title><author>Yang, Zhiyuan ; Zhu, Changming ; Li, Zishi</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c200t-190ad58323eba81837202bf2c95f24c286ceae2882a8ebc27d936e985ebafde73</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Clustering</topic><topic>Commonality</topic><topic>Computer Communication Networks</topic><topic>Computer Graphics</topic><topic>Computer Science</topic><topic>Cryptology</topic><topic>Data Storage Representation</topic><topic>Dilution</topic><topic>Multimedia Information Systems</topic><topic>Operating Systems</topic><topic>Regular Paper</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Yang, Zhiyuan</creatorcontrib><creatorcontrib>Zhu, Changming</creatorcontrib><creatorcontrib>Li, Zishi</creatorcontrib><collection>CrossRef</collection><jtitle>Multimedia systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Yang, Zhiyuan</au><au>Zhu, Changming</au><au>Li, Zishi</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Deep contrastive multi-view clustering with doubly enhanced commonality</atitle><jtitle>Multimedia systems</jtitle><stitle>Multimedia Systems</stitle><date>2024-08-01</date><risdate>2024</risdate><volume>30</volume><issue>4</issue><artnum>196</artnum><issn>0942-4962</issn><eissn>1432-1882</eissn><abstract>Recently, deep multi-view clustering leveraging autoencoders has garnered significant attention due to its ability to simultaneously enhance feature learning capabilities and optimize clustering outcomes. However, existing autoencoder-based deep multi-view clustering methods often exhibit a tendency to either overly emphasize view-specific information, thus neglecting shared information across views, or alternatively, to place undue focus on shared information, resulting in the dilution of complementary information from individual views. Given the principle that commonality resides within individuality, this paper proposes a staged training approach that comprises two phases: pre-training and fine-tuning. The pre-training phase primarily focuses on learning view-specific information, while the fine-tuning phase aims to doubly enhance commonality across views while maintaining these specific details. Specifically, we learn and extract the specific information of each view through the autoencoder in the pre-training stage. After entering the fine-tuning stage, we first initially enhance the commonality between independent specific views through the transformer layer, and then further strengthen these commonalities through contrastive learning on the semantic labels of each view, so as to obtain more accurate clustering results.</abstract><cop>Berlin/Heidelberg</cop><pub>Springer Berlin Heidelberg</pub><doi>10.1007/s00530-024-01400-1</doi></addata></record>
fulltext	fulltext
identifier	ISSN: 0942-4962
ispartof	Multimedia systems, 2024-08, Vol.30 (4), Article 196
issn	0942-4962 1432-1882
language	eng
recordid	cdi_proquest_journals_3075962372
source	SpringerLink Journals - AutoHoldings
subjects	Clustering Commonality Computer Communication Networks Computer Graphics Computer Science Cryptology Data Storage Representation Dilution Multimedia Information Systems Operating Systems Regular Paper
title	Deep contrastive multi-view clustering with doubly enhanced commonality
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-03T14%3A21%3A20IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Deep%20contrastive%20multi-view%20clustering%20with%20doubly%20enhanced%20commonality&rft.jtitle=Multimedia%20systems&rft.au=Yang,%20Zhiyuan&rft.date=2024-08-01&rft.volume=30&rft.issue=4&rft.artnum=196&rft.issn=0942-4962&rft.eissn=1432-1882&rft_id=info:doi/10.1007/s00530-024-01400-1&rft_dat=%3Cproquest_cross%3E3075962372%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3075962372&rft_id=info:pmid/&rfr_iscdi=true