A new data clustering algorithm based on critical distance methodology
•A new CDC algorithm.•6 indicators to evaluate the results.•26 experiments had been conducted. A variety of algorithms have recently emerged in the field of cluster analysis. Consequently, based on the distribution nature of the data, an appropriate algorithm can be chosen for the purpose of cluster...
Gespeichert in:
Veröffentlicht in: | Expert systems with applications 2019-09, Vol.129, p.296-310 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | •A new CDC algorithm.•6 indicators to evaluate the results.•26 experiments had been conducted.
A variety of algorithms have recently emerged in the field of cluster analysis. Consequently, based on the distribution nature of the data, an appropriate algorithm can be chosen for the purpose of clustering. It is difficult for a user to decide a priori which algorithm would be the most appropriate for a given dataset. Algorithms based on graphs provide good results for this task. However, these algorithms are vulnerable to outliers with limited information about edges contained in the tree to split a dataset. Thus, in several fields, the need for better clustering algorithms increases and for this reason utilizing robust and dynamic algorithms to improve and simplify the whole process of data clustering has become an urgent need. In this paper, we propose a novel distance-based clustering algorithm called the critical distance clustering algorithm. This algorithm depends on the Euclidean distance between data points and some basic mathematical statistics operations. The algorithm is simple, robust, and flexible; it works with quantitative data that are real-valued, not qualitative, and categorical with different dimensions. In this work, 26 experiments are conducted using different types of real and synthetic datasets taken from different fields. The results prove that the new algorithm outperforms some popular clustering algorithms such as MST-based clustering, K-means, and Dbscan. Moreover, the algorithm can precisely produce more reasonable clusters even when the dataset contains outliers and without specifying any parameters in advance. It also provides a number of indicators to evaluate the established clusters and prove the validity of the clustering. |
---|---|
ISSN: | 0957-4174 1873-6793 |
DOI: | 10.1016/j.eswa.2019.03.051 |