A two-phase K-means algorithm for large datasets

Abstract One of the drawbacks of the K-means algorithm is the need for several iterations over datasets before it converges on a solution. Therefore, its application is limited to relatively small datasets. This paper presents a scalable version of the K-means algorithm that employs a buffering tech...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Proceedings of the Institution of Mechanical Engineers. Part C, Journal of mechanical engineering science Journal of mechanical engineering science, 2004-10, Vol.218 (10), p.1269-1273
Hauptverfasser:	Pham, D T, Dimov, S S, Nguyen, C D
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Cluster analysis Data mining Datasets Market segmentation Methods
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Abstract One of the drawbacks of the K-means algorithm is the need for several iterations over datasets before it converges on a solution. Therefore, its application is limited to relatively small datasets. This paper presents a scalable version of the K-means algorithm that employs a buffering technique. The new algorithm, Two-Phase K-means, can robustly find a good solution in only one iteration.
ISSN:	0954-4062 2041-2983
DOI:	10.1243/0954406042369008