Antlion Optimizer Algorithm Modification for Initial Centroid Determination in K-means Algorithm

Clustering is a grouping of data used in data mining processing. K-means is one of the popular clustering algorithms, is easy to use, and is fast in clustering data. The K-means method groups the data based on k distances and randomly determines the initial centroid as a reference for processing. Ca...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) (Online) 2023-08, Vol.7 (4), p.870-883
Hauptverfasser: Nanang Lestio Wibowo, Moch Arief Soeleman, Ahmad Zainul Fanani
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Clustering is a grouping of data used in data mining processing. K-means is one of the popular clustering algorithms, is easy to use, and is fast in clustering data. The K-means method groups the data based on k distances and randomly determines the initial centroid as a reference for processing. Careless selection of centroids can result in poor clustering processes and local optima. One of the improvements in determining the initial centroid on the k-means method is to use the optimization method to determine the initial centroid. The modified Antlion Optimizer (ALO) method is used to improve poor clustering in the initial centroid determination and as an alternative to determining the initial centroid in the k-means method for better clustering results. The results of the research on the use of the proposed method for determining the initial centroid provide an increase in clustering compared to the usual k-means and k-means++ methods. This is evidenced by the evaluation of the sum of intragroup distance (SICD) with UCI datasets, namely iris, wine, glass, ecoli, and cancer, in each method, the best SICD value was obtained in the proposed method. Then measuring the best SICD value for each method and dataset is measured by providing a ranking proving that the proposed method on the iris, wine, and cancer datasets gets the first rank, and on the ecoli and glass datasets the proposed method and the k-means++ method both get the first rank. From the average ranking value, the proposed method is ranked first, which provides evidence that the proposed method can improve the clustering results and can be an alternative method for determining the initial center of a cluster using the k-means method.  
ISSN:2580-0760
2580-0760
DOI:10.29207/resti.v7i4.4997