Proportionate feature selection - A pre-processing step for clustering

Accuracy and efficiency of clustering algorithms depend greatly on the input data. Thus, removing unimportant features from the dataset can help us form better clusters in lesser time. These unimportant features may be those that are redundant, or affected by noise, etc. We also need to consider the...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Sekhon, J.S., Gopalkrishnan, V., Keong, N.W.
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Accuracy and efficiency of clustering algorithms depend greatly on the input data. Thus, removing unimportant features from the dataset can help us form better clusters in lesser time. These unimportant features may be those that are redundant, or affected by noise, etc. We also need to consider the fact that the features we finally choose, should represent the original dataset in the best possible way. In other words, the underlying structure of the original dataset should be the same as that of the dataset that contains only the selected features. In this paper we propose a technique that selects a subset of features that best represent the entire dataset. This technique is based on two measures - Distance Measure and Similarity Measure. We first group the similar features and then select a proportionate number of features from each group. We perform experiments on a gene expression microarray dataset, and our experimental results show that using our technique as a pre-processing step significantly increases the quality of clusters generated by the underlying K-means algorithm.We also demonstrate that our approach is better than other contemporary pre-processing filters.
ISSN:1062-922X
2577-1655
DOI:10.1109/ICSMC.2008.4811691