GoldFinger: Fast & Approximate Jaccard for Efficient KNN Graph Constructions

We propose GoldFinger , a new compact and fast-to-compute binary representation of datasets to approximate Jaccard's index. We illustrate the effectiveness of GoldFinger on the emblematic Big Data problem of K-Nearest-Neighbor (KNN) graph construction and show that GoldFinger can drastically ac...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on knowledge and data engineering 2023-11, Vol.35 (11), p.1-14
Hauptverfasser: Guerraoui, Rachid, Kermarrec, Anne-Marie, Niot, Guilhem, Ruas, Olivier, Taiani, Francois
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:We propose GoldFinger , a new compact and fast-to-compute binary representation of datasets to approximate Jaccard's index. We illustrate the effectiveness of GoldFinger on the emblematic Big Data problem of K-Nearest-Neighbor (KNN) graph construction and show that GoldFinger can drastically accelerate a large range of existing KNN algorithms with little to no overhead. As a side effect, we also show that the compact representation of the data protects users' privacy for free by providing k -anonymity and l -diversity. Our extensive evaluation of the resulting approach on several realistic datasets shows that our approach reduces computation times by up to 78.9% compared to raw data while only incurring a negligible to moderate loss in terms of KNN quality. We also show that GoldFinger can be applied to KNN queries (a widely-used search technique) and delivers speedups of up to \times 3.55 over one of the most efficient approaches to this problem.
ISSN:1041-4347
1558-2191
DOI:10.1109/TKDE.2022.3232689