KNN-BLOCK DBSCAN: Fast Clustering for Large-Scale Data
Large-scale data clustering is an essential key for big data problem. However, no current existing approach is "optimal" for big data due to high complexity, which remains it a great challenge. In this article, a simple but fast approximate DBSCAN, namely, KNN-BLOCK DBSCAN, is proposed bas...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on systems, man, and cybernetics. Systems man, and cybernetics. Systems, 2021-06, Vol.51 (6), p.3939-3953 |
---|---|
Hauptverfasser: | , , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Large-scale data clustering is an essential key for big data problem. However, no current existing approach is "optimal" for big data due to high complexity, which remains it a great challenge. In this article, a simple but fast approximate DBSCAN, namely, KNN-BLOCK DBSCAN, is proposed based on two findings: 1) the problem of identifying whether a point is a core point or not is, in fact, a kNN problem and 2) a point has a similar density distribution to its neighbors, and neighbor points are highly possible to be the same type (core point, border point, or noise). KNN-BLOCK DBSCAN uses a fast approximate kNN algorithm, namely, FLANN, to detect core-blocks (CBs), noncore-blocks, and noise-blocks within which all points have the same type, then a fast algorithm for merging CBs and assigning noncore points to proper clusters is also invented to speedup the clustering process. The experimental results show that KNN-BLOCK DBSCAN is an effective approximate DBSCAN algorithm with high accuracy, and outperforms other current variants of DBSCAN, including \rho -approximate DBSCAN and AnyDBC. |
---|---|
ISSN: | 2168-2216 2168-2232 |
DOI: | 10.1109/TSMC.2019.2956527 |