A feature group weighting method for subspace clustering of high-dimensional data

This paper proposes a new method to weight subspaces in feature groups and individual features for clustering high-dimensional data. In this method, the features of high-dimensional data are divided into feature groups, based on their natural characteristics. Two types of weights are introduced to t...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Pattern recognition 2012, Vol.45 (1), p.434-446
Hauptverfasser:	Chen, Xiaojun, Ye, Yunming, Xu, Xiaofei, Huang, Joshua Zhexue
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Applied sciences Clustering Clusters Data mining Exact sciences and technology Feature weighting High-dimensional data analysis Information theory Information, signal and communications theory k-Means Noise Optimization Pattern recognition Signal and communications theory Signal representation. Spectral analysis Signal, noise Subspace clustering Subspaces Telecommunications and information theory Weighting methods
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	This paper proposes a new method to weight subspaces in feature groups and individual features for clustering high-dimensional data. In this method, the features of high-dimensional data are divided into feature groups, based on their natural characteristics. Two types of weights are introduced to the clustering process to simultaneously identify the importance of feature groups and individual features in each cluster. A new optimization model is given to define the optimization process and a new clustering algorithm FG- k-means is proposed to optimize the optimization model. The new algorithm is an extension to k-means by adding two additional steps to automatically calculate the two types of subspace weights. A new data generation method is presented to generate high-dimensional data with clusters in subspaces of both feature groups and individual features. Experimental results on synthetic and real-life data have shown that the FG- k-means algorithm significantly outperformed four k-means type algorithms, i.e., k-means, W- k-means, LAC and EWKM in almost all experiments. The new algorithm is robust to noise and missing values which commonly exist in high-dimensional data. ► Its first method to weight subspaces of feature groups and individual features. ► We propose the FG- k-means algorithm to optimize the new model. ► We present a method to generate data with clusters in subspaces of feature groups. ► We present experimental results on synthetic and real-life data of FG- k-means. ► Experimental results demonstrate that it can be used for feature selection.
ISSN:	0031-3203 1873-5142
DOI:	10.1016/j.patcog.2011.06.004