Efficient nearest neighbor search in high dimensional hamming space

•We address the problem of fast approximate nearest neighbor searching (ANN) in high dimensional Hamming space.•Two existing techniques (LPP and KD-Tree) are combined in a novel and smart manner to achieve an elegant solution of the studied problem, while neither of them is competent for the studied...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Pattern recognition 2020-03, Vol.99, p.107082, Article 107082
Hauptverfasser: Fan, Bin, Kong, Qingqun, Zhang, Baoqian, Liu, Hongmin, Pan, Chunhong, Lu, Jiwen
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•We address the problem of fast approximate nearest neighbor searching (ANN) in high dimensional Hamming space.•Two existing techniques (LPP and KD-Tree) are combined in a novel and smart manner to achieve an elegant solution of the studied problem, while neither of them is competent for the studied problem.•Detailed analysis on complexity and performance of the proposed method has been supplied.•Extensive experiments with comparison to the state of the art have been reported to validate the effectiveness of the proposed method. The proposed method improves the searching accuracy of the state of the art by 30% (from 60% to 90%) when the searching speed maintains two orders of magnitude faster than the linear scan for a one million database. The speed up factor is even higher for larger database. Fast approximate nearest neighbor search has been well studied for real-valued vectors, however, the methods for binary descriptors are less developed. The paper addresses this problem by resorting to the well established techniques in Euclidean space. To this end, the binary descriptors are firstly mapped into low dimensional float vectors under the condition that the neighborhood information in the original Hamming space could be preserved in the mapped Euclidean space as much as possible. Then, KD-Tree is used to partitioning the mapped Euclidean space in order to quickly find approximate nearest neighbors for a given query point. This is identical to filter out a subset of nearest neighbor candidates in the original Hamming space due to the property of neighborhood preserving. Finally, Hamming ranking is applied to the small number of candidates to find out the approximate nearest neighbor in the original Hamming space, with only a fraction of running time compared to the bruteforce linear scan. Our experiments demonstrate that the proposed method significantly outperforms the state of the arts, obtaining improved search accuracy at various speed up factors, e.g., at least 16% improvement of search accuracy over previous methods (from 67.7% to 83.7%) when the search speed is 200 times faster than the linear scan for a one million database.
ISSN:0031-3203
1873-5142
DOI:10.1016/j.patcog.2019.107082