GDPC: generalized density peaks clustering algorithm based on order similarity

Clustering is a fundamental approach to discover the valuable information in data mining and machine learning. Density peaks clustering is a typical density based clustering and has received increasing attention in recent years. However DPC and most of its improvements still suffer from some drawbac...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	International journal of machine learning and cybernetics 2021-03, Vol.12 (3), p.719-731
Hauptverfasser:	Yang, Xiaofei, Cai, Zhiling, Li, Ruijia, Zhu, William
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Artificial Intelligence Clustering Complex Systems Computational Intelligence Control Data mining Datasets Density Engineering Euclidean geometry Machine learning Mechatronics Original Article Pattern Recognition Robotics Similarity Systems Biology
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Clustering is a fundamental approach to discover the valuable information in data mining and machine learning. Density peaks clustering is a typical density based clustering and has received increasing attention in recent years. However DPC and most of its improvements still suffer from some drawbacks. For example, it is difficult to find peaks in the sparse cluster regions; assignment for the remaining points tends to cause Domino effect, especially for complicated data. To address the above two problems, we propose generalized density peaks clustering algorithm (GDPC) based on a new order similarity, which is calculated by the order rank of Euclidean distance between two samples. The order similarity can help us to find peaks in the sparse regions. In addition, a two-step assignment is used to weaken Domino effect. In general, GDPC can not only discover clusters in datasets regardless of different sizes, dimensions and shapes, but also address the above two issues. Several experiments on datasets, including Lung, COIL20, ORL, USPS, Mnist, breast and Vote, show that our algorithm is effective in most cases.
ISSN:	1868-8071 1868-808X
DOI:	10.1007/s13042-020-01198-0