A Novel Algorithm for Initial Cluster Center Selection

As one of the most important techniques in data mining, clustering has always been highly concerned. Most clustering algorithms have encountered challenges, such as the difficulty of cluster centers selection, the artificial determination of the number of clusters {K} , low accuracy of clustering,...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE access 2019, Vol.7, p.74683-74693
Hauptverfasser: Li, Yating, Cai, Jianghui, Yang, Haifeng, Zhang, Jifu, Zhao, Xujun
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:As one of the most important techniques in data mining, clustering has always been highly concerned. Most clustering algorithms have encountered challenges, such as the difficulty of cluster centers selection, the artificial determination of the number of clusters {K} , low accuracy of clustering, and uneven clustering efficiency of different data sets. Considering the difficulty of cluster centers chosen, a new algorithm of fast selecting the initial cluster centers is proposed in this paper. Generally, cluster centers are those data points with higher density, smaller radius threshold and far away from each other, this method uses {MNN} ( {M} nearest neighbors), density and distance to determine the initial cluster centers. First, the neighborhood radius {r} of each point is measured by {MNN} based on distance, and the average value of all {r} is marked as \bar {r} ; second, the densities \rho of each point in the region within \bar {r} are calculated; and then, factor {f} is defined to describe the probability that points become cluster centers, based on which, the initial cluster centers are determined by the candidates with bigger {f} . In the end, the method proposed in this paper is tested by using 12 groups of typical benchmark data sets and applied in the stellar spectral data of {LAMOST} survey. The experiment results compared with the other six algorithms indicate that the initial cluster centers obtained by this method are of higher quality than that of the six algorithms. Meanwhile, the initial cluster centers of spectral data are of good agreement with the actual stellar classifications.
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2019.2921320