HashFile: An efficient index structure for multimedia data

Nearest neighbor (NN) search in high dimensional space is an essential query in many multimedia retrieval applications. Due to the curse of dimensionality, existing index structures might perform even worse than a simple sequential scan of data when answering exact NN query. To improve the efficienc...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Dongxiang Zhang, Agrawal, D, Gang Chen, Tung, A K H
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1114
container_issue
container_start_page 1103
container_title
container_volume
creator Dongxiang Zhang
Agrawal, D
Gang Chen
Tung, A K H
description Nearest neighbor (NN) search in high dimensional space is an essential query in many multimedia retrieval applications. Due to the curse of dimensionality, existing index structures might perform even worse than a simple sequential scan of data when answering exact NN query. To improve the efficiency of NN search, locality sensitive hashing (LSH) and its variants have been proposed to find approximate NN. They adopt hash functions that can preserve the Euclidean distance so that similar objects have a high probability of colliding in the same bucket. Given a query object, candidate for the query result is obtained by accessing the points that are located in the same bucket. To improve the precision, each hash table is associated with m hash functions to recursively hash the data points into smaller buckets and remove the false positives. On the other hand, multiple hash tables are required to guarantee a high retrieval recall. Thus, tuning a good tradeoff between precision and recall becomes the main challenge for LSH. Recently, locality sensitive B-tree(LSB-tree) has been proposed to ensure both quality and efficiency. However, the index uses random I/O access. When the multimedia database is large, it requires considerable disk I/O cost to obtain an approximate ratio that works in practice. In this paper, we propose a novel index structure, named HashFile, for efficient retrieval of multimedia objects. It combines the advantages of random projection and linear scan. Unlike the LSH family in which each bucket is associated with a concatenation of m hash values, we only recursively partition the dense buckets and organize them as a tree structure. Given a query point q, the search algorithm explores the buckets near the query object in a top-down manner. The candidate buckets in each node are stored sequentially in increasing order of the hash value and can be efficiently loaded into memory for linear scan. HashFile can support both exact and approximate NN queries. Experimental results show that HashFile performs better than existing indexes both in answering both types of NN queries.
doi_str_mv 10.1109/ICDE.2011.5767837
format Conference Proceeding
fullrecord <record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_5767837</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>5767837</ieee_id><sourcerecordid>5767837</sourcerecordid><originalsourceid>FETCH-LOGICAL-i175t-4a741dafeef42384aadc6162b4aae2a8f8daa79fe8f5d8c1f470ddc5cab383543</originalsourceid><addsrcrecordid>eNotkEtLw0AUhccXGGt-gLiZP5A4d97prqStLRTcKLgrt5k7OJJWyQP031swZ3M-OPAtDmMPIEoAUT1t6-WqlAKgNM46r9wFuwMttfaVFXDJMqmcKYS071csr5yfNlPZa5aBsKqwystblvf9pzin0gBGZGy-wf5jnVqa88WJU4ypSXQaeDoF-uH90I3NMHbE41fHj2M7pCOFhDzggPfsJmLbUz71jL2tV6_1pti9PG_rxa5I4MxQaHQaAkaiqKXyGjE0Fqw8nIkk-ugDoqsi-WiCbyBqJ0JoTIMH5ZXRasYe_72JiPbfXTpi97ufblB_GHtNJA</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>HashFile: An efficient index structure for multimedia data</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Dongxiang Zhang ; Agrawal, D ; Gang Chen ; Tung, A K H</creator><creatorcontrib>Dongxiang Zhang ; Agrawal, D ; Gang Chen ; Tung, A K H</creatorcontrib><description>Nearest neighbor (NN) search in high dimensional space is an essential query in many multimedia retrieval applications. Due to the curse of dimensionality, existing index structures might perform even worse than a simple sequential scan of data when answering exact NN query. To improve the efficiency of NN search, locality sensitive hashing (LSH) and its variants have been proposed to find approximate NN. They adopt hash functions that can preserve the Euclidean distance so that similar objects have a high probability of colliding in the same bucket. Given a query object, candidate for the query result is obtained by accessing the points that are located in the same bucket. To improve the precision, each hash table is associated with m hash functions to recursively hash the data points into smaller buckets and remove the false positives. On the other hand, multiple hash tables are required to guarantee a high retrieval recall. Thus, tuning a good tradeoff between precision and recall becomes the main challenge for LSH. Recently, locality sensitive B-tree(LSB-tree) has been proposed to ensure both quality and efficiency. However, the index uses random I/O access. When the multimedia database is large, it requires considerable disk I/O cost to obtain an approximate ratio that works in practice. In this paper, we propose a novel index structure, named HashFile, for efficient retrieval of multimedia objects. It combines the advantages of random projection and linear scan. Unlike the LSH family in which each bucket is associated with a concatenation of m hash values, we only recursively partition the dense buckets and organize them as a tree structure. Given a query point q, the search algorithm explores the buckets near the query object in a top-down manner. The candidate buckets in each node are stored sequentially in increasing order of the hash value and can be efficiently loaded into memory for linear scan. HashFile can support both exact and approximate NN queries. Experimental results show that HashFile performs better than existing indexes both in answering both types of NN queries.</description><identifier>ISSN: 1063-6382</identifier><identifier>ISBN: 9781424489596</identifier><identifier>ISBN: 1424489598</identifier><identifier>EISSN: 2375-026X</identifier><identifier>EISBN: 1424489601</identifier><identifier>EISBN: 142448958X</identifier><identifier>EISBN: 9781424489589</identifier><identifier>EISBN: 9781424489602</identifier><identifier>DOI: 10.1109/ICDE.2011.5767837</identifier><language>eng</language><publisher>IEEE</publisher><subject>Lead</subject><ispartof>2011 IEEE 27th International Conference on Data Engineering, 2011, p.1103-1114</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/5767837$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,2056,27923,54918</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/5767837$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Dongxiang Zhang</creatorcontrib><creatorcontrib>Agrawal, D</creatorcontrib><creatorcontrib>Gang Chen</creatorcontrib><creatorcontrib>Tung, A K H</creatorcontrib><title>HashFile: An efficient index structure for multimedia data</title><title>2011 IEEE 27th International Conference on Data Engineering</title><addtitle>ICDE</addtitle><description>Nearest neighbor (NN) search in high dimensional space is an essential query in many multimedia retrieval applications. Due to the curse of dimensionality, existing index structures might perform even worse than a simple sequential scan of data when answering exact NN query. To improve the efficiency of NN search, locality sensitive hashing (LSH) and its variants have been proposed to find approximate NN. They adopt hash functions that can preserve the Euclidean distance so that similar objects have a high probability of colliding in the same bucket. Given a query object, candidate for the query result is obtained by accessing the points that are located in the same bucket. To improve the precision, each hash table is associated with m hash functions to recursively hash the data points into smaller buckets and remove the false positives. On the other hand, multiple hash tables are required to guarantee a high retrieval recall. Thus, tuning a good tradeoff between precision and recall becomes the main challenge for LSH. Recently, locality sensitive B-tree(LSB-tree) has been proposed to ensure both quality and efficiency. However, the index uses random I/O access. When the multimedia database is large, it requires considerable disk I/O cost to obtain an approximate ratio that works in practice. In this paper, we propose a novel index structure, named HashFile, for efficient retrieval of multimedia objects. It combines the advantages of random projection and linear scan. Unlike the LSH family in which each bucket is associated with a concatenation of m hash values, we only recursively partition the dense buckets and organize them as a tree structure. Given a query point q, the search algorithm explores the buckets near the query object in a top-down manner. The candidate buckets in each node are stored sequentially in increasing order of the hash value and can be efficiently loaded into memory for linear scan. HashFile can support both exact and approximate NN queries. Experimental results show that HashFile performs better than existing indexes both in answering both types of NN queries.</description><subject>Lead</subject><issn>1063-6382</issn><issn>2375-026X</issn><isbn>9781424489596</isbn><isbn>1424489598</isbn><isbn>1424489601</isbn><isbn>142448958X</isbn><isbn>9781424489589</isbn><isbn>9781424489602</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2011</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNotkEtLw0AUhccXGGt-gLiZP5A4d97prqStLRTcKLgrt5k7OJJWyQP031swZ3M-OPAtDmMPIEoAUT1t6-WqlAKgNM46r9wFuwMttfaVFXDJMqmcKYS071csr5yfNlPZa5aBsKqwystblvf9pzin0gBGZGy-wf5jnVqa88WJU4ypSXQaeDoF-uH90I3NMHbE41fHj2M7pCOFhDzggPfsJmLbUz71jL2tV6_1pti9PG_rxa5I4MxQaHQaAkaiqKXyGjE0Fqw8nIkk-ugDoqsi-WiCbyBqJ0JoTIMH5ZXRasYe_72JiPbfXTpi97ufblB_GHtNJA</recordid><startdate>201104</startdate><enddate>201104</enddate><creator>Dongxiang Zhang</creator><creator>Agrawal, D</creator><creator>Gang Chen</creator><creator>Tung, A K H</creator><general>IEEE</general><scope>6IE</scope><scope>6IH</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIO</scope></search><sort><creationdate>201104</creationdate><title>HashFile: An efficient index structure for multimedia data</title><author>Dongxiang Zhang ; Agrawal, D ; Gang Chen ; Tung, A K H</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i175t-4a741dafeef42384aadc6162b4aae2a8f8daa79fe8f5d8c1f470ddc5cab383543</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2011</creationdate><topic>Lead</topic><toplevel>online_resources</toplevel><creatorcontrib>Dongxiang Zhang</creatorcontrib><creatorcontrib>Agrawal, D</creatorcontrib><creatorcontrib>Gang Chen</creatorcontrib><creatorcontrib>Tung, A K H</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan (POP) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP) 1998-present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Dongxiang Zhang</au><au>Agrawal, D</au><au>Gang Chen</au><au>Tung, A K H</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>HashFile: An efficient index structure for multimedia data</atitle><btitle>2011 IEEE 27th International Conference on Data Engineering</btitle><stitle>ICDE</stitle><date>2011-04</date><risdate>2011</risdate><spage>1103</spage><epage>1114</epage><pages>1103-1114</pages><issn>1063-6382</issn><eissn>2375-026X</eissn><isbn>9781424489596</isbn><isbn>1424489598</isbn><eisbn>1424489601</eisbn><eisbn>142448958X</eisbn><eisbn>9781424489589</eisbn><eisbn>9781424489602</eisbn><abstract>Nearest neighbor (NN) search in high dimensional space is an essential query in many multimedia retrieval applications. Due to the curse of dimensionality, existing index structures might perform even worse than a simple sequential scan of data when answering exact NN query. To improve the efficiency of NN search, locality sensitive hashing (LSH) and its variants have been proposed to find approximate NN. They adopt hash functions that can preserve the Euclidean distance so that similar objects have a high probability of colliding in the same bucket. Given a query object, candidate for the query result is obtained by accessing the points that are located in the same bucket. To improve the precision, each hash table is associated with m hash functions to recursively hash the data points into smaller buckets and remove the false positives. On the other hand, multiple hash tables are required to guarantee a high retrieval recall. Thus, tuning a good tradeoff between precision and recall becomes the main challenge for LSH. Recently, locality sensitive B-tree(LSB-tree) has been proposed to ensure both quality and efficiency. However, the index uses random I/O access. When the multimedia database is large, it requires considerable disk I/O cost to obtain an approximate ratio that works in practice. In this paper, we propose a novel index structure, named HashFile, for efficient retrieval of multimedia objects. It combines the advantages of random projection and linear scan. Unlike the LSH family in which each bucket is associated with a concatenation of m hash values, we only recursively partition the dense buckets and organize them as a tree structure. Given a query point q, the search algorithm explores the buckets near the query object in a top-down manner. The candidate buckets in each node are stored sequentially in increasing order of the hash value and can be efficiently loaded into memory for linear scan. HashFile can support both exact and approximate NN queries. Experimental results show that HashFile performs better than existing indexes both in answering both types of NN queries.</abstract><pub>IEEE</pub><doi>10.1109/ICDE.2011.5767837</doi><tpages>12</tpages></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1063-6382
ispartof 2011 IEEE 27th International Conference on Data Engineering, 2011, p.1103-1114
issn 1063-6382
2375-026X
language eng
recordid cdi_ieee_primary_5767837
source IEEE Electronic Library (IEL) Conference Proceedings
subjects Lead
title HashFile: An efficient index structure for multimedia data
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-13T20%3A33%3A56IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=HashFile:%20An%20efficient%20index%20structure%20for%20multimedia%20data&rft.btitle=2011%20IEEE%2027th%20International%20Conference%20on%20Data%20Engineering&rft.au=Dongxiang%20Zhang&rft.date=2011-04&rft.spage=1103&rft.epage=1114&rft.pages=1103-1114&rft.issn=1063-6382&rft.eissn=2375-026X&rft.isbn=9781424489596&rft.isbn_list=1424489598&rft_id=info:doi/10.1109/ICDE.2011.5767837&rft_dat=%3Cieee_6IE%3E5767837%3C/ieee_6IE%3E%3Curl%3E%3C/url%3E&rft.eisbn=1424489601&rft.eisbn_list=142448958X&rft.eisbn_list=9781424489589&rft.eisbn_list=9781424489602&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=5767837&rfr_iscdi=true