Hybrid PCA-ILGC clustering approach for high dimensional data

The availability of high dimensional dataset that incredible growth, imposes insufficient conventional approaches to extract hidden useful information. As a result, today researchers are challenged to develop new techniques to deal with massive high dimensional data that has not only in term of numb...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Musdholifah, A., Hashim, S. Z. M., Ngah, R.
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The availability of high dimensional dataset that incredible growth, imposes insufficient conventional approaches to extract hidden useful information. As a result, today researchers are challenged to develop new techniques to deal with massive high dimensional data that has not only in term of number of data but also in the number of attributes. In order to improve effectiveness and accuracy of mining task on high dimensional data, an efficient dimensionality reduction method should be executed in data preprocessing stage before clustering technique is applied. Many clustering algorithms has been proposed and used to discover useful information from a dataset. Iterative Local Gaussian Clustering (ILGC) is a simple density based clustering technique that has successfully discovered number of clusters represented in the dataset. In this paper we proposed to use the Principal Component Analysis (PCA) method to preprocess the data prior to ILGC clustering in order to simplify the analysis and visualization of multi dimensional data set. The proposed approach is validated with benchmark classification datasets. In addition, the performance of proposed hybrid PCA-ILGC clustering approach is compared to original ILGC, basic k-means and hybridized k-means. The experimental results indicate that the proposed approach is capable to obtain clusters with higher accuracy, and time taken to process the data was decreased.
ISSN:1062-922X
2577-1655
DOI:10.1109/ICSMC.2012.6377760