Generalized Residual Vector Quantization and Aggregating Tree for Large Scale Search

Vector quantization is an essential tool for tasks involving large scale data, for example, large scale similarity search, which is crucial for content-based information retrieval and analysis. In this paper, we propose a novel vector quantization framework that iteratively minimizes quantization er...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on multimedia 2017-08, Vol.19 (8), p.1785-1797
Hauptverfasser: Liu, Shicong, Shao, Junru, Lu, Hongtao
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1797
container_issue 8
container_start_page 1785
container_title IEEE transactions on multimedia
container_volume 19
creator Liu, Shicong
Shao, Junru
Lu, Hongtao
description Vector quantization is an essential tool for tasks involving large scale data, for example, large scale similarity search, which is crucial for content-based information retrieval and analysis. In this paper, we propose a novel vector quantization framework that iteratively minimizes quantization error. First, we provide a detailed review on a relevant vector quantization method named residual vector quantization (RVQ). Next, we propose generalized residual vector quantization (GRVQ) to further improve over RVQ. Many vector quantization methods can be viewed as special cases of our proposed method. To enable GRVQ on billion scale data, we introduce a nonexhaustive search scheme named aggregating tree (A-Tree) for high dimensional data that uses GRVQ encodings to build a radix tree and perform the nearest neighbor search by beam search. To search accurately and efficiently, VQ-encodings should satisfy locally aggregating encoding criterion: For any node of the corresponding A-Tree, neighboring vectors should aggregate in fewer subtrees to make beam search efficient. We show that the proposed GRVQ encodings best satisfy the suggested criterion, and the joint use of GRVQ and A-Tree shows significantly better performances on billion scale datasets. Our methods are validated on several standard benchmark datasets. Experimental results and empirical analysis show the superior efficiency and effectiveness of our proposed methods compared to the state-of-the-art for large scale search.
doi_str_mv 10.1109/TMM.2017.2692181
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_1920468189</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>7894185</ieee_id><sourcerecordid>1920468189</sourcerecordid><originalsourceid>FETCH-LOGICAL-c291t-bbef79dfd13b9a26784fda4907c66ec1daa8dababbe3e7b6011d91ad1b90d66f3</originalsourceid><addsrcrecordid>eNo9kE1Lw0AQQBdRsFbvgpeA59SZJN3NHkvRKlREjV6XSXYSU2JSN8nB_nq3tHiZD-bNDDwhrhFmiKDvsufnWQSoZpHUEaZ4IiaoEwwBlDr19TyC0A_gXFz0_QYAkzmoichW3LKjpt6xDd64r-1ITfDJxdC54HWkdqh3NNRdG1Brg0VVOa5831ZB5piD0lNrchUH7wU1PjK54utSnJXU9Hx1zFPx8XCfLR_D9cvqablYh0WkcQjznEulbWkxzjVFUqVJaSnRoAopuUBLlFrKyXMxq1wCotVIFnMNVsoynorbw92t635G7gez6UbX-pcGdQSJTDHVnoIDVbiu7x2XZuvqb3K_BsHs3RnvzuzdmaM7v3JzWKmZ-R9XqReazuM_8fZrow</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1920468189</pqid></control><display><type>article</type><title>Generalized Residual Vector Quantization and Aggregating Tree for Large Scale Search</title><source>IEEE Electronic Library (IEL)</source><creator>Liu, Shicong ; Shao, Junru ; Lu, Hongtao</creator><creatorcontrib>Liu, Shicong ; Shao, Junru ; Lu, Hongtao</creatorcontrib><description>Vector quantization is an essential tool for tasks involving large scale data, for example, large scale similarity search, which is crucial for content-based information retrieval and analysis. In this paper, we propose a novel vector quantization framework that iteratively minimizes quantization error. First, we provide a detailed review on a relevant vector quantization method named residual vector quantization (RVQ). Next, we propose generalized residual vector quantization (GRVQ) to further improve over RVQ. Many vector quantization methods can be viewed as special cases of our proposed method. To enable GRVQ on billion scale data, we introduce a nonexhaustive search scheme named aggregating tree (A-Tree) for high dimensional data that uses GRVQ encodings to build a radix tree and perform the nearest neighbor search by beam search. To search accurately and efficiently, VQ-encodings should satisfy locally aggregating encoding criterion: For any node of the corresponding A-Tree, neighboring vectors should aggregate in fewer subtrees to make beam search efficient. We show that the proposed GRVQ encodings best satisfy the suggested criterion, and the joint use of GRVQ and A-Tree shows significantly better performances on billion scale datasets. Our methods are validated on several standard benchmark datasets. Experimental results and empirical analysis show the superior efficiency and effectiveness of our proposed methods compared to the state-of-the-art for large scale search.</description><identifier>ISSN: 1520-9210</identifier><identifier>EISSN: 1941-0077</identifier><identifier>DOI: 10.1109/TMM.2017.2692181</identifier><identifier>CODEN: ITMUF8</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Criteria ; Datasets ; Empirical analysis ; Encoding ; Euclidean space ; Feedback control systems ; High dimensional data ; Indexes ; Information entropy ; Information retrieval ; large scale data ; Measurement ; Methods ; nearest neighbor search ; Nearest neighbor searches ; Searching ; Similarity ; similarity search ; Vector quantization</subject><ispartof>IEEE transactions on multimedia, 2017-08, Vol.19 (8), p.1785-1797</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2017</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c291t-bbef79dfd13b9a26784fda4907c66ec1daa8dababbe3e7b6011d91ad1b90d66f3</citedby><cites>FETCH-LOGICAL-c291t-bbef79dfd13b9a26784fda4907c66ec1daa8dababbe3e7b6011d91ad1b90d66f3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/7894185$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/7894185$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Liu, Shicong</creatorcontrib><creatorcontrib>Shao, Junru</creatorcontrib><creatorcontrib>Lu, Hongtao</creatorcontrib><title>Generalized Residual Vector Quantization and Aggregating Tree for Large Scale Search</title><title>IEEE transactions on multimedia</title><addtitle>TMM</addtitle><description>Vector quantization is an essential tool for tasks involving large scale data, for example, large scale similarity search, which is crucial for content-based information retrieval and analysis. In this paper, we propose a novel vector quantization framework that iteratively minimizes quantization error. First, we provide a detailed review on a relevant vector quantization method named residual vector quantization (RVQ). Next, we propose generalized residual vector quantization (GRVQ) to further improve over RVQ. Many vector quantization methods can be viewed as special cases of our proposed method. To enable GRVQ on billion scale data, we introduce a nonexhaustive search scheme named aggregating tree (A-Tree) for high dimensional data that uses GRVQ encodings to build a radix tree and perform the nearest neighbor search by beam search. To search accurately and efficiently, VQ-encodings should satisfy locally aggregating encoding criterion: For any node of the corresponding A-Tree, neighboring vectors should aggregate in fewer subtrees to make beam search efficient. We show that the proposed GRVQ encodings best satisfy the suggested criterion, and the joint use of GRVQ and A-Tree shows significantly better performances on billion scale datasets. Our methods are validated on several standard benchmark datasets. Experimental results and empirical analysis show the superior efficiency and effectiveness of our proposed methods compared to the state-of-the-art for large scale search.</description><subject>Criteria</subject><subject>Datasets</subject><subject>Empirical analysis</subject><subject>Encoding</subject><subject>Euclidean space</subject><subject>Feedback control systems</subject><subject>High dimensional data</subject><subject>Indexes</subject><subject>Information entropy</subject><subject>Information retrieval</subject><subject>large scale data</subject><subject>Measurement</subject><subject>Methods</subject><subject>nearest neighbor search</subject><subject>Nearest neighbor searches</subject><subject>Searching</subject><subject>Similarity</subject><subject>similarity search</subject><subject>Vector quantization</subject><issn>1520-9210</issn><issn>1941-0077</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2017</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kE1Lw0AQQBdRsFbvgpeA59SZJN3NHkvRKlREjV6XSXYSU2JSN8nB_nq3tHiZD-bNDDwhrhFmiKDvsufnWQSoZpHUEaZ4IiaoEwwBlDr19TyC0A_gXFz0_QYAkzmoichW3LKjpt6xDd64r-1ITfDJxdC54HWkdqh3NNRdG1Brg0VVOa5831ZB5piD0lNrchUH7wU1PjK54utSnJXU9Hx1zFPx8XCfLR_D9cvqablYh0WkcQjznEulbWkxzjVFUqVJaSnRoAopuUBLlFrKyXMxq1wCotVIFnMNVsoynorbw92t635G7gez6UbX-pcGdQSJTDHVnoIDVbiu7x2XZuvqb3K_BsHs3RnvzuzdmaM7v3JzWKmZ-R9XqReazuM_8fZrow</recordid><startdate>20170801</startdate><enddate>20170801</enddate><creator>Liu, Shicong</creator><creator>Shao, Junru</creator><creator>Lu, Hongtao</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20170801</creationdate><title>Generalized Residual Vector Quantization and Aggregating Tree for Large Scale Search</title><author>Liu, Shicong ; Shao, Junru ; Lu, Hongtao</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c291t-bbef79dfd13b9a26784fda4907c66ec1daa8dababbe3e7b6011d91ad1b90d66f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2017</creationdate><topic>Criteria</topic><topic>Datasets</topic><topic>Empirical analysis</topic><topic>Encoding</topic><topic>Euclidean space</topic><topic>Feedback control systems</topic><topic>High dimensional data</topic><topic>Indexes</topic><topic>Information entropy</topic><topic>Information retrieval</topic><topic>large scale data</topic><topic>Measurement</topic><topic>Methods</topic><topic>nearest neighbor search</topic><topic>Nearest neighbor searches</topic><topic>Searching</topic><topic>Similarity</topic><topic>similarity search</topic><topic>Vector quantization</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Liu, Shicong</creatorcontrib><creatorcontrib>Shao, Junru</creatorcontrib><creatorcontrib>Lu, Hongtao</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on multimedia</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Liu, Shicong</au><au>Shao, Junru</au><au>Lu, Hongtao</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Generalized Residual Vector Quantization and Aggregating Tree for Large Scale Search</atitle><jtitle>IEEE transactions on multimedia</jtitle><stitle>TMM</stitle><date>2017-08-01</date><risdate>2017</risdate><volume>19</volume><issue>8</issue><spage>1785</spage><epage>1797</epage><pages>1785-1797</pages><issn>1520-9210</issn><eissn>1941-0077</eissn><coden>ITMUF8</coden><abstract>Vector quantization is an essential tool for tasks involving large scale data, for example, large scale similarity search, which is crucial for content-based information retrieval and analysis. In this paper, we propose a novel vector quantization framework that iteratively minimizes quantization error. First, we provide a detailed review on a relevant vector quantization method named residual vector quantization (RVQ). Next, we propose generalized residual vector quantization (GRVQ) to further improve over RVQ. Many vector quantization methods can be viewed as special cases of our proposed method. To enable GRVQ on billion scale data, we introduce a nonexhaustive search scheme named aggregating tree (A-Tree) for high dimensional data that uses GRVQ encodings to build a radix tree and perform the nearest neighbor search by beam search. To search accurately and efficiently, VQ-encodings should satisfy locally aggregating encoding criterion: For any node of the corresponding A-Tree, neighboring vectors should aggregate in fewer subtrees to make beam search efficient. We show that the proposed GRVQ encodings best satisfy the suggested criterion, and the joint use of GRVQ and A-Tree shows significantly better performances on billion scale datasets. Our methods are validated on several standard benchmark datasets. Experimental results and empirical analysis show the superior efficiency and effectiveness of our proposed methods compared to the state-of-the-art for large scale search.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/TMM.2017.2692181</doi><tpages>13</tpages></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1520-9210
ispartof IEEE transactions on multimedia, 2017-08, Vol.19 (8), p.1785-1797
issn 1520-9210
1941-0077
language eng
recordid cdi_proquest_journals_1920468189
source IEEE Electronic Library (IEL)
subjects Criteria
Datasets
Empirical analysis
Encoding
Euclidean space
Feedback control systems
High dimensional data
Indexes
Information entropy
Information retrieval
large scale data
Measurement
Methods
nearest neighbor search
Nearest neighbor searches
Searching
Similarity
similarity search
Vector quantization
title Generalized Residual Vector Quantization and Aggregating Tree for Large Scale Search
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-05T10%3A25%3A45IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Generalized%20Residual%20Vector%20Quantization%20and%20Aggregating%20Tree%20for%20Large%20Scale%20Search&rft.jtitle=IEEE%20transactions%20on%20multimedia&rft.au=Liu,%20Shicong&rft.date=2017-08-01&rft.volume=19&rft.issue=8&rft.spage=1785&rft.epage=1797&rft.pages=1785-1797&rft.issn=1520-9210&rft.eissn=1941-0077&rft.coden=ITMUF8&rft_id=info:doi/10.1109/TMM.2017.2692181&rft_dat=%3Cproquest_RIE%3E1920468189%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1920468189&rft_id=info:pmid/&rft_ieee_id=7894185&rfr_iscdi=true