Exploiting the structure of furthest neighbor search for fast approximate results

We present a novel strategy for approximate furthest neighbor search that selects a set of candidate points using the data distribution. This strategy leads to an algorithm, which we call DrusillaSelect, that is able to outperform existing approximate furthest neighbor strategies. Our strategy is mo...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Information systems (Oxford) 2019-02, Vol.80, p.124-135
Hauptverfasser:	Curtin, Ryan R., Echauz, Javier, Gardner, Andrew B.
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Approximation Data transmission Farthest neighbor search Furthest neighbor search Hubness and anti-hubness Information systems Information theory Machine learning Mathematical analysis Network hubs Search engines Searching Strategy
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	We present a novel strategy for approximate furthest neighbor search that selects a set of candidate points using the data distribution. This strategy leads to an algorithm, which we call DrusillaSelect, that is able to outperform existing approximate furthest neighbor strategies. Our strategy is motivated by a study of the behavior of the furthest neighbor search problem, which has significantly different structure than the nearest neighbor search problem, and can be understood with the help of an information-theoretic hardness measure that we introduce. We also present a variant of the algorithm that gives an absolute approximation guarantee; under some assumptions, the guaranteed approximation can be achieved in provably less time than brute-force search. Performance studies indicate that DrusillaSelect can achieve comparable levels of approximation to other algorithms, even on the hardest datasets, while giving up to an order of magnitude speedup. An implementation is available in the mlpack machine learning library (found at http://www.mlpack.org).
ISSN:	0306-4379 1873-6076
DOI:	10.1016/j.is.2017.12.010