Scalable Probabilistic Similarity Ranking in Uncertain Databases

This paper introduces a scalable approach for probabilistic top-k similarity ranking on uncertain vector data. Each uncertain object is represented by a set of vector instances that is assumed to be mutually exclusive. The objective is to rank the uncertain data according to their distance to a refe...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on knowledge and data engineering 2010-09, Vol.22 (9), p.1234-1246
Hauptverfasser: Bernecker, Thomas, Kriegel, Hans-Peter, Mamoulis, Nikos, Renz, Matthias, Zuefle, Andreas
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1246
container_issue 9
container_start_page 1234
container_title IEEE transactions on knowledge and data engineering
container_volume 22
creator Bernecker, Thomas
Kriegel, Hans-Peter
Mamoulis, Nikos
Renz, Matthias
Zuefle, Andreas
description This paper introduces a scalable approach for probabilistic top-k similarity ranking on uncertain vector data. Each uncertain object is represented by a set of vector instances that is assumed to be mutually exclusive. The objective is to rank the uncertain data according to their distance to a reference object. We propose a framework that incrementally computes for each object instance and ranking position, the probability of the object falling at that ranking position. The resulting rank probability distribution can serve as input for several state-of-the-art probabilistic ranking models. Existing approaches compute this probability distribution by applying the Poisson binomial recurrence technique of quadratic complexity. In this paper, we theoretically as well as experimentally show that our framework reduces this to a linear-time complexity while having the same memory requirements, facilitated by incremental accessing of the uncertain vector instances in increasing order of their distance to the reference object. Furthermore, we show how the output of our method can be used to apply probabilistic top-k ranking for the objects, according to different state-of-the-art definitions. We conduct an experimental evaluation on synthetic and real data, which demonstrates the efficiency of our approach.
doi_str_mv 10.1109/TKDE.2010.78
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_miscellaneous_818833109</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>5467070</ieee_id><sourcerecordid>2720445091</sourcerecordid><originalsourceid>FETCH-LOGICAL-c353t-553771dadae745dd59c9332d683b71a99ba6b3a025f7b538afdbee720748cca03</originalsourceid><addsrcrecordid>eNpd0D1PwzAQBmALgUQpbGwskRhYSLHjuOdsoLZ8iEog2s7W2XGQS5oUOx3673FUxMB070mPTqeXkEtGR4zR4m75Op2NMhpXkEdkwISQacYKdhwzzVma8xxOyVkIa0qpBMkG5H5hsEZd2-Tdtxq1q13onEkWbuNq9K7bJx_YfLnmM3FNsmqM9R3GNMUONQYbzslJhXWwF79zSFaPs-XkOZ2_Pb1MHuap4YJ3qRAcgJVYooVclKUoTMF5Vo4l18CwKDSONUeaiQq04BKrUlsLGYVcGoOUD8nN4e7Wt987Gzq1ccHYusbGtrugJJOS89hClNf_5Lrd-SY-pxjNAEAW417dHpTxbQjeVmrr3Qb9PiLVt6n6NlXfpgIZ-dWBO2vtHxX5GChQ_gME329k</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1027778969</pqid></control><display><type>article</type><title>Scalable Probabilistic Similarity Ranking in Uncertain Databases</title><source>IEEE Electronic Library (IEL)</source><creator>Bernecker, Thomas ; Kriegel, Hans-Peter ; Mamoulis, Nikos ; Renz, Matthias ; Zuefle, Andreas</creator><creatorcontrib>Bernecker, Thomas ; Kriegel, Hans-Peter ; Mamoulis, Nikos ; Renz, Matthias ; Zuefle, Andreas</creatorcontrib><description>This paper introduces a scalable approach for probabilistic top-k similarity ranking on uncertain vector data. Each uncertain object is represented by a set of vector instances that is assumed to be mutually exclusive. The objective is to rank the uncertain data according to their distance to a reference object. We propose a framework that incrementally computes for each object instance and ranking position, the probability of the object falling at that ranking position. The resulting rank probability distribution can serve as input for several state-of-the-art probabilistic ranking models. Existing approaches compute this probability distribution by applying the Poisson binomial recurrence technique of quadratic complexity. In this paper, we theoretically as well as experimentally show that our framework reduces this to a linear-time complexity while having the same memory requirements, facilitated by incremental accessing of the uncertain vector instances in increasing order of their distance to the reference object. Furthermore, we show how the output of our method can be used to apply probabilistic top-k ranking for the objects, according to different state-of-the-art definitions. We conduct an experimental evaluation on synthetic and real data, which demonstrates the efficiency of our approach.</description><identifier>ISSN: 1041-4347</identifier><identifier>EISSN: 1558-2191</identifier><identifier>DOI: 10.1109/TKDE.2010.78</identifier><identifier>CODEN: ITKEEH</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Data mining ; Image databases ; Mathematical analysis ; Mathematical models ; Multimedia databases ; Nearest neighbor searches ; Object detection ; Probabilistic methods ; probabilistic ranking ; Probability distribution ; Probability theory ; Ranking ; Ratings &amp; rankings ; Search engines ; Similarity ; similarity search ; Spatial databases ; State of the art ; Studies ; Temperature sensors ; Uncertain databases ; Uncertainty ; Vectors (mathematics)</subject><ispartof>IEEE transactions on knowledge and data engineering, 2010-09, Vol.22 (9), p.1234-1246</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Sep 2010</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c353t-553771dadae745dd59c9332d683b71a99ba6b3a025f7b538afdbee720748cca03</citedby><cites>FETCH-LOGICAL-c353t-553771dadae745dd59c9332d683b71a99ba6b3a025f7b538afdbee720748cca03</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/5467070$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/5467070$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Bernecker, Thomas</creatorcontrib><creatorcontrib>Kriegel, Hans-Peter</creatorcontrib><creatorcontrib>Mamoulis, Nikos</creatorcontrib><creatorcontrib>Renz, Matthias</creatorcontrib><creatorcontrib>Zuefle, Andreas</creatorcontrib><title>Scalable Probabilistic Similarity Ranking in Uncertain Databases</title><title>IEEE transactions on knowledge and data engineering</title><addtitle>TKDE</addtitle><description>This paper introduces a scalable approach for probabilistic top-k similarity ranking on uncertain vector data. Each uncertain object is represented by a set of vector instances that is assumed to be mutually exclusive. The objective is to rank the uncertain data according to their distance to a reference object. We propose a framework that incrementally computes for each object instance and ranking position, the probability of the object falling at that ranking position. The resulting rank probability distribution can serve as input for several state-of-the-art probabilistic ranking models. Existing approaches compute this probability distribution by applying the Poisson binomial recurrence technique of quadratic complexity. In this paper, we theoretically as well as experimentally show that our framework reduces this to a linear-time complexity while having the same memory requirements, facilitated by incremental accessing of the uncertain vector instances in increasing order of their distance to the reference object. Furthermore, we show how the output of our method can be used to apply probabilistic top-k ranking for the objects, according to different state-of-the-art definitions. We conduct an experimental evaluation on synthetic and real data, which demonstrates the efficiency of our approach.</description><subject>Data mining</subject><subject>Image databases</subject><subject>Mathematical analysis</subject><subject>Mathematical models</subject><subject>Multimedia databases</subject><subject>Nearest neighbor searches</subject><subject>Object detection</subject><subject>Probabilistic methods</subject><subject>probabilistic ranking</subject><subject>Probability distribution</subject><subject>Probability theory</subject><subject>Ranking</subject><subject>Ratings &amp; rankings</subject><subject>Search engines</subject><subject>Similarity</subject><subject>similarity search</subject><subject>Spatial databases</subject><subject>State of the art</subject><subject>Studies</subject><subject>Temperature sensors</subject><subject>Uncertain databases</subject><subject>Uncertainty</subject><subject>Vectors (mathematics)</subject><issn>1041-4347</issn><issn>1558-2191</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2010</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpd0D1PwzAQBmALgUQpbGwskRhYSLHjuOdsoLZ8iEog2s7W2XGQS5oUOx3673FUxMB070mPTqeXkEtGR4zR4m75Op2NMhpXkEdkwISQacYKdhwzzVma8xxOyVkIa0qpBMkG5H5hsEZd2-Tdtxq1q13onEkWbuNq9K7bJx_YfLnmM3FNsmqM9R3GNMUONQYbzslJhXWwF79zSFaPs-XkOZ2_Pb1MHuap4YJ3qRAcgJVYooVclKUoTMF5Vo4l18CwKDSONUeaiQq04BKrUlsLGYVcGoOUD8nN4e7Wt987Gzq1ccHYusbGtrugJJOS89hClNf_5Lrd-SY-pxjNAEAW417dHpTxbQjeVmrr3Qb9PiLVt6n6NlXfpgIZ-dWBO2vtHxX5GChQ_gME329k</recordid><startdate>20100901</startdate><enddate>20100901</enddate><creator>Bernecker, Thomas</creator><creator>Kriegel, Hans-Peter</creator><creator>Mamoulis, Nikos</creator><creator>Renz, Matthias</creator><creator>Zuefle, Andreas</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>F28</scope><scope>FR3</scope></search><sort><creationdate>20100901</creationdate><title>Scalable Probabilistic Similarity Ranking in Uncertain Databases</title><author>Bernecker, Thomas ; Kriegel, Hans-Peter ; Mamoulis, Nikos ; Renz, Matthias ; Zuefle, Andreas</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c353t-553771dadae745dd59c9332d683b71a99ba6b3a025f7b538afdbee720748cca03</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2010</creationdate><topic>Data mining</topic><topic>Image databases</topic><topic>Mathematical analysis</topic><topic>Mathematical models</topic><topic>Multimedia databases</topic><topic>Nearest neighbor searches</topic><topic>Object detection</topic><topic>Probabilistic methods</topic><topic>probabilistic ranking</topic><topic>Probability distribution</topic><topic>Probability theory</topic><topic>Ranking</topic><topic>Ratings &amp; rankings</topic><topic>Search engines</topic><topic>Similarity</topic><topic>similarity search</topic><topic>Spatial databases</topic><topic>State of the art</topic><topic>Studies</topic><topic>Temperature sensors</topic><topic>Uncertain databases</topic><topic>Uncertainty</topic><topic>Vectors (mathematics)</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Bernecker, Thomas</creatorcontrib><creatorcontrib>Kriegel, Hans-Peter</creatorcontrib><creatorcontrib>Mamoulis, Nikos</creatorcontrib><creatorcontrib>Renz, Matthias</creatorcontrib><creatorcontrib>Zuefle, Andreas</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>ANTE: Abstracts in New Technology &amp; Engineering</collection><collection>Engineering Research Database</collection><jtitle>IEEE transactions on knowledge and data engineering</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Bernecker, Thomas</au><au>Kriegel, Hans-Peter</au><au>Mamoulis, Nikos</au><au>Renz, Matthias</au><au>Zuefle, Andreas</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Scalable Probabilistic Similarity Ranking in Uncertain Databases</atitle><jtitle>IEEE transactions on knowledge and data engineering</jtitle><stitle>TKDE</stitle><date>2010-09-01</date><risdate>2010</risdate><volume>22</volume><issue>9</issue><spage>1234</spage><epage>1246</epage><pages>1234-1246</pages><issn>1041-4347</issn><eissn>1558-2191</eissn><coden>ITKEEH</coden><abstract>This paper introduces a scalable approach for probabilistic top-k similarity ranking on uncertain vector data. Each uncertain object is represented by a set of vector instances that is assumed to be mutually exclusive. The objective is to rank the uncertain data according to their distance to a reference object. We propose a framework that incrementally computes for each object instance and ranking position, the probability of the object falling at that ranking position. The resulting rank probability distribution can serve as input for several state-of-the-art probabilistic ranking models. Existing approaches compute this probability distribution by applying the Poisson binomial recurrence technique of quadratic complexity. In this paper, we theoretically as well as experimentally show that our framework reduces this to a linear-time complexity while having the same memory requirements, facilitated by incremental accessing of the uncertain vector instances in increasing order of their distance to the reference object. Furthermore, we show how the output of our method can be used to apply probabilistic top-k ranking for the objects, according to different state-of-the-art definitions. We conduct an experimental evaluation on synthetic and real data, which demonstrates the efficiency of our approach.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TKDE.2010.78</doi><tpages>13</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1041-4347
ispartof IEEE transactions on knowledge and data engineering, 2010-09, Vol.22 (9), p.1234-1246
issn 1041-4347
1558-2191
language eng
recordid cdi_proquest_miscellaneous_818833109
source IEEE Electronic Library (IEL)
subjects Data mining
Image databases
Mathematical analysis
Mathematical models
Multimedia databases
Nearest neighbor searches
Object detection
Probabilistic methods
probabilistic ranking
Probability distribution
Probability theory
Ranking
Ratings & rankings
Search engines
Similarity
similarity search
Spatial databases
State of the art
Studies
Temperature sensors
Uncertain databases
Uncertainty
Vectors (mathematics)
title Scalable Probabilistic Similarity Ranking in Uncertain Databases
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-07T02%3A38%3A18IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Scalable%20Probabilistic%20Similarity%20Ranking%20in%20Uncertain%20Databases&rft.jtitle=IEEE%20transactions%20on%20knowledge%20and%20data%20engineering&rft.au=Bernecker,%20Thomas&rft.date=2010-09-01&rft.volume=22&rft.issue=9&rft.spage=1234&rft.epage=1246&rft.pages=1234-1246&rft.issn=1041-4347&rft.eissn=1558-2191&rft.coden=ITKEEH&rft_id=info:doi/10.1109/TKDE.2010.78&rft_dat=%3Cproquest_RIE%3E2720445091%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1027778969&rft_id=info:pmid/&rft_ieee_id=5467070&rfr_iscdi=true