Toward more efficient locality‐sensitive hashing via constructing novel hash function cluster

Locality‐sensitive hashing (LSH) is widely used in the context of nearest neighbor search of large‐scale high‐dimensions. However, there are serious imbalance problems between the efficiency of data index structure construction and the query accuracy of LSH methods. In this article, a novel higher‐e...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Concurrency and computation 2021-10, Vol.33 (20), p.n/a
Hauptverfasser: Zhang, Shi, Huang, Jin, Xiao, Ruliang, Du, Xin, Gong, Ping, Lin, Xinhong
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Locality‐sensitive hashing (LSH) is widely used in the context of nearest neighbor search of large‐scale high‐dimensions. However, there are serious imbalance problems between the efficiency of data index structure construction and the query accuracy of LSH methods. In this article, a novel higher‐entropy‐hyperplane clusters LSH (HEHC‐LSH) algorithm is proposed, which we improve vector quantization to preprocess the data and greatly shortens the preprocessing time; We innovatively integrate the maximum entropy principle into the distribution estimation algorithm to construct a novel hash function cluster method, also incorporate bootstrap aggregating of ensemble learning, and adopt the parallel index dictionary to improve the generalization performance of the index structure. And in the query stage, we realize the comprehensive filtering of index set using integrated learning idea, which not only avoids a lot of distance calculation, but also improves the quality of query results. We also analyze the rationality and effectiveness of the proposed method. Finally, extensive experiment results show that HEHC‐LSH can achieve more higher precision and efficiency simultaneously comparing to current methods, and reflect the strong robustness on different datasets.
ISSN:1532-0626
1532-0634
DOI:10.1002/cpe.6355