Accelerating EM clustering to find high-quality solutions

Clustering is one of the most important techniques used in data mining. This article focuses on the EM clustering algorithm. Two fundamental aspects are studied: achieving faster convergence and finding higher quality clustering solutions. This work introduces several improvements to the EM clusteri...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Knowledge and information systems 2005-02, Vol.7 (2), p.135-157
Hauptverfasser: Ordonez, Carlos, Omiecinski, Edward
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Clustering is one of the most important techniques used in data mining. This article focuses on the EM clustering algorithm. Two fundamental aspects are studied: achieving faster convergence and finding higher quality clustering solutions. This work introduces several improvements to the EM clustering algorithm, being periodic M steps during initial iterations, reseeding of low-weight clusters and splitting of high-weight clusters the most important. These improvements lead to two important parameters. The first parameter is the number of M steps per iteration and the second one, a weight threshold to reseed low-weight clusters. Experiments show how frequently the M step must be executed and what weight threshold values make EM reach higher quality solutions. In general, the improved EM clustering algorithm finds higher quality solutions than the classical EM algorithm and converges in fewer iterations. [PUBLICATION ABSTRACT]
ISSN:0219-1377
0219-3116
DOI:10.1007/s10115-003-0141-6