Scalable Probabilistic Similarity Ranking in Uncertain Databases
This paper introduces a scalable approach for probabilistic top-k similarity ranking on uncertain vector data. Each uncertain object is represented by a set of vector instances that is assumed to be mutually exclusive. The objective is to rank the uncertain data according to their distance to a refe...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on knowledge and data engineering 2010-09, Vol.22 (9), p.1234-1246 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 1246 |
---|---|
container_issue | 9 |
container_start_page | 1234 |
container_title | IEEE transactions on knowledge and data engineering |
container_volume | 22 |
creator | Bernecker, Thomas Kriegel, Hans-Peter Mamoulis, Nikos Renz, Matthias Zuefle, Andreas |
description | This paper introduces a scalable approach for probabilistic top-k similarity ranking on uncertain vector data. Each uncertain object is represented by a set of vector instances that is assumed to be mutually exclusive. The objective is to rank the uncertain data according to their distance to a reference object. We propose a framework that incrementally computes for each object instance and ranking position, the probability of the object falling at that ranking position. The resulting rank probability distribution can serve as input for several state-of-the-art probabilistic ranking models. Existing approaches compute this probability distribution by applying the Poisson binomial recurrence technique of quadratic complexity. In this paper, we theoretically as well as experimentally show that our framework reduces this to a linear-time complexity while having the same memory requirements, facilitated by incremental accessing of the uncertain vector instances in increasing order of their distance to the reference object. Furthermore, we show how the output of our method can be used to apply probabilistic top-k ranking for the objects, according to different state-of-the-art definitions. We conduct an experimental evaluation on synthetic and real data, which demonstrates the efficiency of our approach. |
doi_str_mv | 10.1109/TKDE.2010.78 |
format | Article |
fullrecord | <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_miscellaneous_818833109</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>5467070</ieee_id><sourcerecordid>2720445091</sourcerecordid><originalsourceid>FETCH-LOGICAL-c353t-553771dadae745dd59c9332d683b71a99ba6b3a025f7b538afdbee720748cca03</originalsourceid><addsrcrecordid>eNpd0D1PwzAQBmALgUQpbGwskRhYSLHjuOdsoLZ8iEog2s7W2XGQS5oUOx3673FUxMB070mPTqeXkEtGR4zR4m75Op2NMhpXkEdkwISQacYKdhwzzVma8xxOyVkIa0qpBMkG5H5hsEZd2-Tdtxq1q13onEkWbuNq9K7bJx_YfLnmM3FNsmqM9R3GNMUONQYbzslJhXWwF79zSFaPs-XkOZ2_Pb1MHuap4YJ3qRAcgJVYooVclKUoTMF5Vo4l18CwKDSONUeaiQq04BKrUlsLGYVcGoOUD8nN4e7Wt987Gzq1ccHYusbGtrugJJOS89hClNf_5Lrd-SY-pxjNAEAW417dHpTxbQjeVmrr3Qb9PiLVt6n6NlXfpgIZ-dWBO2vtHxX5GChQ_gME329k</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1027778969</pqid></control><display><type>article</type><title>Scalable Probabilistic Similarity Ranking in Uncertain Databases</title><source>IEEE Electronic Library (IEL)</source><creator>Bernecker, Thomas ; Kriegel, Hans-Peter ; Mamoulis, Nikos ; Renz, Matthias ; Zuefle, Andreas</creator><creatorcontrib>Bernecker, Thomas ; Kriegel, Hans-Peter ; Mamoulis, Nikos ; Renz, Matthias ; Zuefle, Andreas</creatorcontrib><description>This paper introduces a scalable approach for probabilistic top-k similarity ranking on uncertain vector data. Each uncertain object is represented by a set of vector instances that is assumed to be mutually exclusive. The objective is to rank the uncertain data according to their distance to a reference object. We propose a framework that incrementally computes for each object instance and ranking position, the probability of the object falling at that ranking position. The resulting rank probability distribution can serve as input for several state-of-the-art probabilistic ranking models. Existing approaches compute this probability distribution by applying the Poisson binomial recurrence technique of quadratic complexity. In this paper, we theoretically as well as experimentally show that our framework reduces this to a linear-time complexity while having the same memory requirements, facilitated by incremental accessing of the uncertain vector instances in increasing order of their distance to the reference object. Furthermore, we show how the output of our method can be used to apply probabilistic top-k ranking for the objects, according to different state-of-the-art definitions. We conduct an experimental evaluation on synthetic and real data, which demonstrates the efficiency of our approach.</description><identifier>ISSN: 1041-4347</identifier><identifier>EISSN: 1558-2191</identifier><identifier>DOI: 10.1109/TKDE.2010.78</identifier><identifier>CODEN: ITKEEH</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Data mining ; Image databases ; Mathematical analysis ; Mathematical models ; Multimedia databases ; Nearest neighbor searches ; Object detection ; Probabilistic methods ; probabilistic ranking ; Probability distribution ; Probability theory ; Ranking ; Ratings & rankings ; Search engines ; Similarity ; similarity search ; Spatial databases ; State of the art ; Studies ; Temperature sensors ; Uncertain databases ; Uncertainty ; Vectors (mathematics)</subject><ispartof>IEEE transactions on knowledge and data engineering, 2010-09, Vol.22 (9), p.1234-1246</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Sep 2010</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c353t-553771dadae745dd59c9332d683b71a99ba6b3a025f7b538afdbee720748cca03</citedby><cites>FETCH-LOGICAL-c353t-553771dadae745dd59c9332d683b71a99ba6b3a025f7b538afdbee720748cca03</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/5467070$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/5467070$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Bernecker, Thomas</creatorcontrib><creatorcontrib>Kriegel, Hans-Peter</creatorcontrib><creatorcontrib>Mamoulis, Nikos</creatorcontrib><creatorcontrib>Renz, Matthias</creatorcontrib><creatorcontrib>Zuefle, Andreas</creatorcontrib><title>Scalable Probabilistic Similarity Ranking in Uncertain Databases</title><title>IEEE transactions on knowledge and data engineering</title><addtitle>TKDE</addtitle><description>This paper introduces a scalable approach for probabilistic top-k similarity ranking on uncertain vector data. Each uncertain object is represented by a set of vector instances that is assumed to be mutually exclusive. The objective is to rank the uncertain data according to their distance to a reference object. We propose a framework that incrementally computes for each object instance and ranking position, the probability of the object falling at that ranking position. The resulting rank probability distribution can serve as input for several state-of-the-art probabilistic ranking models. Existing approaches compute this probability distribution by applying the Poisson binomial recurrence technique of quadratic complexity. In this paper, we theoretically as well as experimentally show that our framework reduces this to a linear-time complexity while having the same memory requirements, facilitated by incremental accessing of the uncertain vector instances in increasing order of their distance to the reference object. Furthermore, we show how the output of our method can be used to apply probabilistic top-k ranking for the objects, according to different state-of-the-art definitions. We conduct an experimental evaluation on synthetic and real data, which demonstrates the efficiency of our approach.</description><subject>Data mining</subject><subject>Image databases</subject><subject>Mathematical analysis</subject><subject>Mathematical models</subject><subject>Multimedia databases</subject><subject>Nearest neighbor searches</subject><subject>Object detection</subject><subject>Probabilistic methods</subject><subject>probabilistic ranking</subject><subject>Probability distribution</subject><subject>Probability theory</subject><subject>Ranking</subject><subject>Ratings & rankings</subject><subject>Search engines</subject><subject>Similarity</subject><subject>similarity search</subject><subject>Spatial databases</subject><subject>State of the art</subject><subject>Studies</subject><subject>Temperature sensors</subject><subject>Uncertain databases</subject><subject>Uncertainty</subject><subject>Vectors (mathematics)</subject><issn>1041-4347</issn><issn>1558-2191</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2010</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpd0D1PwzAQBmALgUQpbGwskRhYSLHjuOdsoLZ8iEog2s7W2XGQS5oUOx3673FUxMB070mPTqeXkEtGR4zR4m75Op2NMhpXkEdkwISQacYKdhwzzVma8xxOyVkIa0qpBMkG5H5hsEZd2-Tdtxq1q13onEkWbuNq9K7bJx_YfLnmM3FNsmqM9R3GNMUONQYbzslJhXWwF79zSFaPs-XkOZ2_Pb1MHuap4YJ3qRAcgJVYooVclKUoTMF5Vo4l18CwKDSONUeaiQq04BKrUlsLGYVcGoOUD8nN4e7Wt987Gzq1ccHYusbGtrugJJOS89hClNf_5Lrd-SY-pxjNAEAW417dHpTxbQjeVmrr3Qb9PiLVt6n6NlXfpgIZ-dWBO2vtHxX5GChQ_gME329k</recordid><startdate>20100901</startdate><enddate>20100901</enddate><creator>Bernecker, Thomas</creator><creator>Kriegel, Hans-Peter</creator><creator>Mamoulis, Nikos</creator><creator>Renz, Matthias</creator><creator>Zuefle, Andreas</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>F28</scope><scope>FR3</scope></search><sort><creationdate>20100901</creationdate><title>Scalable Probabilistic Similarity Ranking in Uncertain Databases</title><author>Bernecker, Thomas ; Kriegel, Hans-Peter ; Mamoulis, Nikos ; Renz, Matthias ; Zuefle, Andreas</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c353t-553771dadae745dd59c9332d683b71a99ba6b3a025f7b538afdbee720748cca03</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2010</creationdate><topic>Data mining</topic><topic>Image databases</topic><topic>Mathematical analysis</topic><topic>Mathematical models</topic><topic>Multimedia databases</topic><topic>Nearest neighbor searches</topic><topic>Object detection</topic><topic>Probabilistic methods</topic><topic>probabilistic ranking</topic><topic>Probability distribution</topic><topic>Probability theory</topic><topic>Ranking</topic><topic>Ratings & rankings</topic><topic>Search engines</topic><topic>Similarity</topic><topic>similarity search</topic><topic>Spatial databases</topic><topic>State of the art</topic><topic>Studies</topic><topic>Temperature sensors</topic><topic>Uncertain databases</topic><topic>Uncertainty</topic><topic>Vectors (mathematics)</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Bernecker, Thomas</creatorcontrib><creatorcontrib>Kriegel, Hans-Peter</creatorcontrib><creatorcontrib>Mamoulis, Nikos</creatorcontrib><creatorcontrib>Renz, Matthias</creatorcontrib><creatorcontrib>Zuefle, Andreas</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>ANTE: Abstracts in New Technology & Engineering</collection><collection>Engineering Research Database</collection><jtitle>IEEE transactions on knowledge and data engineering</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Bernecker, Thomas</au><au>Kriegel, Hans-Peter</au><au>Mamoulis, Nikos</au><au>Renz, Matthias</au><au>Zuefle, Andreas</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Scalable Probabilistic Similarity Ranking in Uncertain Databases</atitle><jtitle>IEEE transactions on knowledge and data engineering</jtitle><stitle>TKDE</stitle><date>2010-09-01</date><risdate>2010</risdate><volume>22</volume><issue>9</issue><spage>1234</spage><epage>1246</epage><pages>1234-1246</pages><issn>1041-4347</issn><eissn>1558-2191</eissn><coden>ITKEEH</coden><abstract>This paper introduces a scalable approach for probabilistic top-k similarity ranking on uncertain vector data. Each uncertain object is represented by a set of vector instances that is assumed to be mutually exclusive. The objective is to rank the uncertain data according to their distance to a reference object. We propose a framework that incrementally computes for each object instance and ranking position, the probability of the object falling at that ranking position. The resulting rank probability distribution can serve as input for several state-of-the-art probabilistic ranking models. Existing approaches compute this probability distribution by applying the Poisson binomial recurrence technique of quadratic complexity. In this paper, we theoretically as well as experimentally show that our framework reduces this to a linear-time complexity while having the same memory requirements, facilitated by incremental accessing of the uncertain vector instances in increasing order of their distance to the reference object. Furthermore, we show how the output of our method can be used to apply probabilistic top-k ranking for the objects, according to different state-of-the-art definitions. We conduct an experimental evaluation on synthetic and real data, which demonstrates the efficiency of our approach.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TKDE.2010.78</doi><tpages>13</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 1041-4347 |
ispartof | IEEE transactions on knowledge and data engineering, 2010-09, Vol.22 (9), p.1234-1246 |
issn | 1041-4347 1558-2191 |
language | eng |
recordid | cdi_proquest_miscellaneous_818833109 |
source | IEEE Electronic Library (IEL) |
subjects | Data mining Image databases Mathematical analysis Mathematical models Multimedia databases Nearest neighbor searches Object detection Probabilistic methods probabilistic ranking Probability distribution Probability theory Ranking Ratings & rankings Search engines Similarity similarity search Spatial databases State of the art Studies Temperature sensors Uncertain databases Uncertainty Vectors (mathematics) |
title | Scalable Probabilistic Similarity Ranking in Uncertain Databases |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-07T02%3A38%3A18IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Scalable%20Probabilistic%20Similarity%20Ranking%20in%20Uncertain%20Databases&rft.jtitle=IEEE%20transactions%20on%20knowledge%20and%20data%20engineering&rft.au=Bernecker,%20Thomas&rft.date=2010-09-01&rft.volume=22&rft.issue=9&rft.spage=1234&rft.epage=1246&rft.pages=1234-1246&rft.issn=1041-4347&rft.eissn=1558-2191&rft.coden=ITKEEH&rft_id=info:doi/10.1109/TKDE.2010.78&rft_dat=%3Cproquest_RIE%3E2720445091%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1027778969&rft_id=info:pmid/&rft_ieee_id=5467070&rfr_iscdi=true |