Proportionate feature selection - A pre-processing step for clustering
Accuracy and efficiency of clustering algorithms depend greatly on the input data. Thus, removing unimportant features from the dataset can help us form better clusters in lesser time. These unimportant features may be those that are redundant, or affected by noise, etc. We also need to consider the...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Tagungsbericht |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Accuracy and efficiency of clustering algorithms depend greatly on the input data. Thus, removing unimportant features from the dataset can help us form better clusters in lesser time. These unimportant features may be those that are redundant, or affected by noise, etc. We also need to consider the fact that the features we finally choose, should represent the original dataset in the best possible way. In other words, the underlying structure of the original dataset should be the same as that of the dataset that contains only the selected features. In this paper we propose a technique that selects a subset of features that best represent the entire dataset. This technique is based on two measures - Distance Measure and Similarity Measure. We first group the similar features and then select a proportionate number of features from each group. We perform experiments on a gene expression microarray dataset, and our experimental results show that using our technique as a pre-processing step significantly increases the quality of clusters generated by the underlying K-means algorithm.We also demonstrate that our approach is better than other contemporary pre-processing filters. |
---|---|
ISSN: | 1062-922X 2577-1655 |
DOI: | 10.1109/ICSMC.2008.4811691 |