Interactive information bottleneck for high-dimensional co-occurrence data clustering

Clustering high-dimensional data is quite challenging due to lots of redundant and irrelevant information contained in features. Most existing methods sequentially or jointly perform the feature dimensionality reduction and data clustering on the low-dimensional representations. However, the relatio...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Applied soft computing 2021-11, Vol.111, p.107837, Article 107837
Hauptverfasser: Hu, Shizhe, Wang, Ruobin, Ye, Yangdong
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Clustering high-dimensional data is quite challenging due to lots of redundant and irrelevant information contained in features. Most existing methods sequentially or jointly perform the feature dimensionality reduction and data clustering on the low-dimensional representations. However, the relationships between the clustered data points and the dimension-reduced features, as well as the influence of the relationships on the low-dimensional feature subspace learning are neglected in these methods. In this paper, an embarrassingly simple yet effective interactive information bottleneck (IIB) method is proposed for high-dimensional co-occurrence data clustering by simultaneously performing data clustering and low-dimensional feature subspace learning. What is different from existing methods is that, we perform data clustering while maximally preserving the correlations between the data clusters and the learned dimension-reduced features, and simultaneously learn the low-dimensional feature subspace while maintaining the correlations with the data clustering results obtained in the previous iteration. Thus, the two stages are interactive and refined mutually. Finally, a new twin “draw-and-merge” method is designed for optimization. Experimental results on four high-dimensional datasets demonstrate the superiority and effectiveness of the proposed method. •A novel interactive information bottleneck is proposed.•Data clustering and low-dimensional feature learning are simultaneously performed.•Experiments on four real-world datasets show the superiority of the proposed method.
ISSN:1568-4946
1872-9681
DOI:10.1016/j.asoc.2021.107837