Effect of cluster size distribution on clustering: a comparative study of k-means and fuzzy c-means clustering
Data distribution has a significant impact on clustering results. This study focuses on the effect of cluster size distribution on clustering, namely the uniform effect of k -means and fuzzy c -means (FCM) clustering. We first provide some related works of k -means and FCM clustering. Then, the stru...
Gespeichert in:
Veröffentlicht in: | Pattern analysis and applications : PAA 2020-02, Vol.23 (1), p.455-466 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Data distribution has a significant impact on clustering results. This study focuses on the effect of cluster size distribution on clustering, namely the uniform effect of
k
-means and fuzzy
c
-means (FCM) clustering. We first provide some related works of
k
-means and FCM clustering. Then, the structure decomposition analysis of the objective functions of
k
-means and FCM is presented. Afterward, extensive experiments on both synthetic two-dimensional and three-dimensional data sets and real-world data sets from the UCI machine learning repository are conducted. The results demonstrate that FCM has stronger uniform effect than
k
-means clustering. Also, it reveals that the fuzzifier value
m
= 2 in FCM, which has been widely adopted in many applications, is not a good choice, particularly for data sets with great variation in cluster sizes. Therefore, for data sets with significant uneven distributions in cluster sizes, a smaller fuzzifier value is preferred for FCM clustering, and
k
-means clustering is a better choice compared with FCM clustering. |
---|---|
ISSN: | 1433-7541 1433-755X |
DOI: | 10.1007/s10044-019-00783-6 |