Distributed computation of the knn graph for large high-dimensional point sets
High-dimensional problems arising from robot motion planning, biology, data mining, and geographic information systems often require the computation of k nearest neighbor ( knn) graphs. The knn graph of a data set is obtained by connecting each point to its k closest points. As the research in the a...
Gespeichert in:
Veröffentlicht in: | Journal of parallel and distributed computing 2007-03, Vol.67 (3), p.346-359 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | High-dimensional problems arising from robot motion planning, biology, data mining, and geographic information systems often require the computation of
k nearest neighbor (
knn) graphs. The
knn graph of a data set is obtained by connecting each point to its
k closest points. As the research in the above-mentioned fields progressively addresses problems of unprecedented complexity, the demand for computing
knn graphs based on arbitrary distance metrics and large high-dimensional data sets increases, exceeding resources available to a single machine. In this work we efficiently distribute the computation of
knn graphs for clusters of processors with message passing. Extensions to our distributed framework include the computation of graphs based on other proximity queries, such as approximate
knn or range queries. Our experiments show nearly linear speedup with over 100 processors and indicate that similar speedup can be obtained with several hundred processors. |
---|---|
ISSN: | 0743-7315 1096-0848 |
DOI: | 10.1016/j.jpdc.2006.10.004 |