Staleness-Reduction Mini-Batch K-Means

K -means (km) is a clustering algorithm that has been widely adopted due to its simple implementation and high clustering quality. However, the standard km suffers from high computational complexity and is therefore time-consuming. Accordingly, the mini-batch (mbatch) km is proposed to significantly...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transaction on neural networks and learning systems 2024-10, Vol.35 (10), p.14424-14436
Hauptverfasser:	Zhu, Xueying, Sun, Jie, He, Zhenhao, Jiang, Jiantong, Wang, Zeke
Format:	Artikel
Sprache:	eng
Schlagworte:	Clustering Clustering algorithms Computational efficiency Computer science Convergence Iterative methods K-means (km) machine learning Parallel processing staleness-reduction
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	K -means (km) is a clustering algorithm that has been widely adopted due to its simple implementation and high clustering quality. However, the standard km suffers from high computational complexity and is therefore time-consuming. Accordingly, the mini-batch (mbatch) km is proposed to significantly reduce computational costs in a manner that updates centroids after performing distance computations on just a mbatch, rather than a full batch, of samples. Even though the mbatch km converges faster, it leads to a decrease in convergence quality because it introduces staleness during iterations. To this end, in this article, we propose the staleness-reduction mbatch (srmbatch) km, which achieves the best of two worlds: low computational costs like the mbatch km and high clustering quality like the standard km. Moreover, srmbatch still exposes massive parallelism to be efficiently implemented on multicore CPUs and many-core GPUs. The experimental results show that srmbatch can converge up to 40\times - 130\times faster than mbatch when reaching the same target loss, and srmbatch is able to reach 0.2%-1.7% lower final loss than that of mbatch.
ISSN:	2162-237X 2162-2388 2162-2388
DOI:	10.1109/TNNLS.2023.3279122