Generalized Residual Vector Quantization and Aggregating Tree for Large Scale Search

Vector quantization is an essential tool for tasks involving large scale data, for example, large scale similarity search, which is crucial for content-based information retrieval and analysis. In this paper, we propose a novel vector quantization framework that iteratively minimizes quantization er...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on multimedia 2017-08, Vol.19 (8), p.1785-1797
Hauptverfasser: Liu, Shicong, Shao, Junru, Lu, Hongtao
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Vector quantization is an essential tool for tasks involving large scale data, for example, large scale similarity search, which is crucial for content-based information retrieval and analysis. In this paper, we propose a novel vector quantization framework that iteratively minimizes quantization error. First, we provide a detailed review on a relevant vector quantization method named residual vector quantization (RVQ). Next, we propose generalized residual vector quantization (GRVQ) to further improve over RVQ. Many vector quantization methods can be viewed as special cases of our proposed method. To enable GRVQ on billion scale data, we introduce a nonexhaustive search scheme named aggregating tree (A-Tree) for high dimensional data that uses GRVQ encodings to build a radix tree and perform the nearest neighbor search by beam search. To search accurately and efficiently, VQ-encodings should satisfy locally aggregating encoding criterion: For any node of the corresponding A-Tree, neighboring vectors should aggregate in fewer subtrees to make beam search efficient. We show that the proposed GRVQ encodings best satisfy the suggested criterion, and the joint use of GRVQ and A-Tree shows significantly better performances on billion scale datasets. Our methods are validated on several standard benchmark datasets. Experimental results and empirical analysis show the superior efficiency and effectiveness of our proposed methods compared to the state-of-the-art for large scale search.
ISSN:1520-9210
1941-0077
DOI:10.1109/TMM.2017.2692181