KNN-BLOCK DBSCAN: Fast Clustering for Large-Scale Data

Large-scale data clustering is an essential key for big data problem. However, no current existing approach is "optimal" for big data due to high complexity, which remains it a great challenge. In this article, a simple but fast approximate DBSCAN, namely, KNN-BLOCK DBSCAN, is proposed bas...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on systems, man, and cybernetics. Systems man, and cybernetics. Systems, 2021-06, Vol.51 (6), p.3939-3953
Hauptverfasser:	Chen, Yewang, Zhou, Lida, Pei, Songwen, Yu, Zhiwen, Chen, Yi, Liu, Xin, Du, Jixiang, Xiong, Naixue
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Approximation algorithms Big Data Clustering Clustering algorithms Complexity theory Computer science DBSCAN Density distribution FLANN kNN KNN-BLOCK DBSCAN Partitioning algorithms Vegetation
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Large-scale data clustering is an essential key for big data problem. However, no current existing approach is "optimal" for big data due to high complexity, which remains it a great challenge. In this article, a simple but fast approximate DBSCAN, namely, KNN-BLOCK DBSCAN, is proposed based on two findings: 1) the problem of identifying whether a point is a core point or not is, in fact, a kNN problem and 2) a point has a similar density distribution to its neighbors, and neighbor points are highly possible to be the same type (core point, border point, or noise). KNN-BLOCK DBSCAN uses a fast approximate kNN algorithm, namely, FLANN, to detect core-blocks (CBs), noncore-blocks, and noise-blocks within which all points have the same type, then a fast algorithm for merging CBs and assigning noncore points to proper clusters is also invented to speedup the clustering process. The experimental results show that KNN-BLOCK DBSCAN is an effective approximate DBSCAN algorithm with high accuracy, and outperforms other current variants of DBSCAN, including \rho -approximate DBSCAN and AnyDBC.
ISSN:	2168-2216 2168-2232
DOI:	10.1109/TSMC.2019.2956527