A two-phase K-means algorithm for large datasets

Abstract One of the drawbacks of the K-means algorithm is the need for several iterations over datasets before it converges on a solution. Therefore, its application is limited to relatively small datasets. This paper presents a scalable version of the K-means algorithm that employs a buffering tech...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Proceedings of the Institution of Mechanical Engineers. Part C, Journal of mechanical engineering science Journal of mechanical engineering science, 2004-10, Vol.218 (10), p.1269-1273
Hauptverfasser: Pham, D T, Dimov, S S, Nguyen, C D
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Abstract One of the drawbacks of the K-means algorithm is the need for several iterations over datasets before it converges on a solution. Therefore, its application is limited to relatively small datasets. This paper presents a scalable version of the K-means algorithm that employs a buffering technique. The new algorithm, Two-Phase K-means, can robustly find a good solution in only one iteration.
ISSN:0954-4062
2041-2983
DOI:10.1243/0954406042369008