Scalable Probabilistic Similarity Ranking in Uncertain Databases

This paper introduces a scalable approach for probabilistic top-k similarity ranking on uncertain vector data. Each uncertain object is represented by a set of vector instances that is assumed to be mutually exclusive. The objective is to rank the uncertain data according to their distance to a refe...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on knowledge and data engineering 2010-09, Vol.22 (9), p.1234-1246
Hauptverfasser:	Bernecker, Thomas, Kriegel, Hans-Peter, Mamoulis, Nikos, Renz, Matthias, Zuefle, Andreas
Format:	Artikel
Sprache:	eng
Schlagworte:	Data mining Image databases Mathematical analysis Mathematical models Multimedia databases Nearest neighbor searches Object detection Probabilistic methods probabilistic ranking Probability distribution Probability theory Ranking Ratings & rankings Search engines Similarity similarity search Spatial databases State of the art Studies Temperature sensors Uncertain databases Uncertainty Vectors (mathematics)
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	1246
container_issue	9
container_start_page	1234
container_title	IEEE transactions on knowledge and data engineering
container_volume	22
creator	Bernecker, Thomas Kriegel, Hans-Peter Mamoulis, Nikos Renz, Matthias Zuefle, Andreas
description	This paper introduces a scalable approach for probabilistic top-k similarity ranking on uncertain vector data. Each uncertain object is represented by a set of vector instances that is assumed to be mutually exclusive. The objective is to rank the uncertain data according to their distance to a reference object. We propose a framework that incrementally computes for each object instance and ranking position, the probability of the object falling at that ranking position. The resulting rank probability distribution can serve as input for several state-of-the-art probabilistic ranking models. Existing approaches compute this probability distribution by applying the Poisson binomial recurrence technique of quadratic complexity. In this paper, we theoretically as well as experimentally show that our framework reduces this to a linear-time complexity while having the same memory requirements, facilitated by incremental accessing of the uncertain vector instances in increasing order of their distance to the reference object. Furthermore, we show how the output of our method can be used to apply probabilistic top-k ranking for the objects, according to different state-of-the-art definitions. We conduct an experimental evaluation on synthetic and real data, which demonstrates the efficiency of our approach.
doi_str_mv	10.1109/TKDE.2010.78
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_miscellaneous_818833109</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>5467070</ieee_id><sourcerecordid>2720445091</sourcerecordid><originalsourceid>FETCH-LOGICAL-c353t-553771dadae745dd59c9332d683b71a99ba6b3a025f7b538afdbee720748cca03</originalsourceid><addsrcrecordid>eNpd0D1PwzAQBmALgUQpbGwskRhYSLHjuOdsoLZ8iEog2s7W2XGQS5oUOx3673FUxMB070mPTqeXkEtGR4zR4m75Op2NMhpXkEdkwISQacYKdhwzzVma8xxOyVkIa0qpBMkG5H5hsEZd2-Tdtxq1q13onEkWbuNq9K7bJx_YfLnmM3FNsmqM9R3GNMUONQYbzslJhXWwF79zSFaPs-XkOZ2_Pb1MHuap4YJ3qRAcgJVYooVclKUoTMF5Vo4l18CwKDSONUeaiQq04BKrUlsLGYVcGoOUD8nN4e7Wt987Gzq1ccHYusbGtrugJJOS89hClNf_5Lrd-SY-pxjNAEAW417dHpTxbQjeVmrr3Qb9PiLVt6n6NlXfpgIZ-dWBO2vtHxX5GChQ_gME329k</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1027778969</pqid></control><display><type>article</type><title>Scalable Probabilistic Similarity Ranking in Uncertain Databases</title><source>IEEE Electronic Library (IEL)</source><creator>Bernecker, Thomas ; Kriegel, Hans-Peter ; Mamoulis, Nikos ; Renz, Matthias ; Zuefle, Andreas</creator><creatorcontrib>Bernecker, Thomas ; Kriegel, Hans-Peter ; Mamoulis, Nikos ; Renz, Matthias ; Zuefle, Andreas</creatorcontrib><description>This paper introduces a scalable approach for probabilistic top-k similarity ranking on uncertain vector data. Each uncertain object is represented by a set of vector instances that is assumed to be mutually exclusive. The objective is to rank the uncertain data according to their distance to a reference object. We propose a framework that incrementally computes for each object instance and ranking position, the probability of the object falling at that ranking position. The resulting rank probability distribution can serve as input for several state-of-the-art probabilistic ranking models. Existing approaches compute this probability distribution by applying the Poisson binomial recurrence technique of quadratic complexity. In this paper, we theoretically as well as experimentally show that our framework reduces this to a linear-time complexity while having the same memory requirements, facilitated by incremental accessing of the uncertain vector instances in increasing order of their distance to the reference object. Furthermore, we show how the output of our method can be used to apply probabilistic top-k ranking for the objects, according to different state-of-the-art definitions. We conduct an experimental evaluation on synthetic and real data, which demonstrates the efficiency of our approach.</description><identifier>ISSN: 1041-4347</identifier><identifier>EISSN: 1558-2191</identifier><identifier>DOI: 10.1109/TKDE.2010.78</identifier><identifier>CODEN: ITKEEH</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Data mining ; Image databases ; Mathematical analysis ; Mathematical models ; Multimedia databases ; Nearest neighbor searches ; Object detection ; Probabilistic methods ; probabilistic ranking ; Probability distribution ; Probability theory ; Ranking ; Ratings & rankings ; Search engines ; Similarity ; similarity search ; Spatial databases ; State of the art ; Studies ; Temperature sensors ; Uncertain databases ; Uncertainty ; Vectors (mathematics)</subject><ispartof>IEEE transactions on knowledge and data engineering, 2010-09, Vol.22 (9), p.1234-1246</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Sep 2010</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c353t-553771dadae745dd59c9332d683b71a99ba6b3a025f7b538afdbee720748cca03</citedby><cites>FETCH-LOGICAL-c353t-553771dadae745dd59c9332d683b71a99ba6b3a025f7b538afdbee720748cca03</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/5467070$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/5467070$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Bernecker, Thomas</creatorcontrib><creatorcontrib>Kriegel, Hans-Peter</creatorcontrib><creatorcontrib>Mamoulis, Nikos</creatorcontrib><creatorcontrib>Renz, Matthias</creatorcontrib><creatorcontrib>Zuefle, Andreas</creatorcontrib><title>Scalable Probabilistic Similarity Ranking in Uncertain Databases</title><title>IEEE transactions on knowledge and data engineering</title><addtitle>TKDE</addtitle><description>This paper introduces a scalable approach for probabilistic top-k similarity ranking on uncertain vector data. Each uncertain object is represented by a set of vector instances that is assumed to be mutually exclusive. The objective is to rank the uncertain data according to their distance to a reference object. We propose a framework that incrementally computes for each object instance and ranking position, the probability of the object falling at that ranking position. The resulting rank probability distribution can serve as input for several state-of-the-art probabilistic ranking models. Existing approaches compute this probability distribution by applying the Poisson binomial recurrence technique of quadratic complexity. In this paper, we theoretically as well as experimentally show that our framework reduces this to a linear-time complexity while having the same memory requirements, facilitated by incremental accessing of the uncertain vector instances in increasing order of their distance to the reference object. Furthermore, we show how the output of our method can be used to apply probabilistic top-k ranking for the objects, according to different state-of-the-art definitions. We conduct an experimental evaluation on synthetic and real data, which demonstrates the efficiency of our approach.</description><subject>Data mining</subject><subject>Image databases</subject><subject>Mathematical analysis</subject><subject>Mathematical models</subject><subject>Multimedia databases</subject><subject>Nearest neighbor searches</subject><subject>Object detection</subject><subject>Probabilistic methods</subject><subject>probabilistic ranking</subject><subject>Probability distribution</subject><subject>Probability theory</subject><subject>Ranking</subject><subject>Ratings & rankings</subject><subject>Search engines</subject><subject>Similarity</subject><subject>similarity search</subject><subject>Spatial databases</subject><subject>State of the art</subject><subject>Studies</subject><subject>Temperature sensors</subject><subject>Uncertain databases</subject><subject>Uncertainty</subject><subject>Vectors (mathematics)</subject><issn>1041-4347</issn><issn>1558-2191</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2010</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpd0D1PwzAQBmALgUQpbGwskRhYSLHjuOdsoLZ8iEog2s7W2XGQS5oUOx3673FUxMB070mPTqeXkEtGR4zR4m75Op2NMhpXkEdkwISQacYKdhwzzVma8xxOyVkIa0qpBMkG5H5hsEZd2-Tdtxq1q13onEkWbuNq9K7bJx_YfLnmM3FNsmqM9R3GNMUONQYbzslJhXWwF79zSFaPs-XkOZ2_Pb1MHuap4YJ3qRAcgJVYooVclKUoTMF5Vo4l18CwKDSONUeaiQq04BKrUlsLGYVcGoOUD8nN4e7Wt987Gzq1ccHYusbGtrugJJOS89hClNf_5Lrd-SY-pxjNAEAW417dHpTxbQjeVmrr3Qb9PiLVt6n6NlXfpgIZ-dWBO2vtHxX5GChQ_gME329k</recordid><startdate>20100901</startdate><enddate>20100901</enddate><creator>Bernecker, Thomas</creator><creator>Kriegel, Hans-Peter</creator><creator>Mamoulis, Nikos</creator><creator>Renz, Matthias</creator><creator>Zuefle, Andreas</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>F28</scope><scope>FR3</scope></search><sort><creationdate>20100901</creationdate><title>Scalable Probabilistic Similarity Ranking in Uncertain Databases</title><author>Bernecker, Thomas ; Kriegel, Hans-Peter ; Mamoulis, Nikos ; Renz, Matthias ; Zuefle, Andreas</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c353t-553771dadae745dd59c9332d683b71a99ba6b3a025f7b538afdbee720748cca03</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2010</creationdate><topic>Data mining</topic><topic>Image databases</topic><topic>Mathematical analysis</topic><topic>Mathematical models</topic><topic>Multimedia databases</topic><topic>Nearest neighbor searches</topic><topic>Object detection</topic><topic>Probabilistic methods</topic><topic>probabilistic ranking</topic><topic>Probability distribution</topic><topic>Probability theory</topic><topic>Ranking</topic><topic>Ratings & rankings</topic><topic>Search engines</topic><topic>Similarity</topic><topic>similarity search</topic><topic>Spatial databases</topic><topic>State of the art</topic><topic>Studies</topic><topic>Temperature sensors</topic><topic>Uncertain databases</topic><topic>Uncertainty</topic><topic>Vectors (mathematics)</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Bernecker, Thomas</creatorcontrib><creatorcontrib>Kriegel, Hans-Peter</creatorcontrib><creatorcontrib>Mamoulis, Nikos</creatorcontrib><creatorcontrib>Renz, Matthias</creatorcontrib><creatorcontrib>Zuefle, Andreas</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>ANTE: Abstracts in New Technology & Engineering</collection><collection>Engineering Research Database</collection><jtitle>IEEE transactions on knowledge and data engineering</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Bernecker, Thomas</au><au>Kriegel, Hans-Peter</au><au>Mamoulis, Nikos</au><au>Renz, Matthias</au><au>Zuefle, Andreas</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Scalable Probabilistic Similarity Ranking in Uncertain Databases</atitle><jtitle>IEEE transactions on knowledge and data engineering</jtitle><stitle>TKDE</stitle><date>2010-09-01</date><risdate>2010</risdate><volume>22</volume><issue>9</issue><spage>1234</spage><epage>1246</epage><pages>1234-1246</pages><issn>1041-4347</issn><eissn>1558-2191</eissn><coden>ITKEEH</coden><abstract>This paper introduces a scalable approach for probabilistic top-k similarity ranking on uncertain vector data. Each uncertain object is represented by a set of vector instances that is assumed to be mutually exclusive. The objective is to rank the uncertain data according to their distance to a reference object. We propose a framework that incrementally computes for each object instance and ranking position, the probability of the object falling at that ranking position. The resulting rank probability distribution can serve as input for several state-of-the-art probabilistic ranking models. Existing approaches compute this probability distribution by applying the Poisson binomial recurrence technique of quadratic complexity. In this paper, we theoretically as well as experimentally show that our framework reduces this to a linear-time complexity while having the same memory requirements, facilitated by incremental accessing of the uncertain vector instances in increasing order of their distance to the reference object. Furthermore, we show how the output of our method can be used to apply probabilistic top-k ranking for the objects, according to different state-of-the-art definitions. We conduct an experimental evaluation on synthetic and real data, which demonstrates the efficiency of our approach.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TKDE.2010.78</doi><tpages>13</tpages><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 1041-4347
ispartof	IEEE transactions on knowledge and data engineering, 2010-09, Vol.22 (9), p.1234-1246
issn	1041-4347 1558-2191
language	eng
recordid	cdi_proquest_miscellaneous_818833109
source	IEEE Electronic Library (IEL)
subjects	Data mining Image databases Mathematical analysis Mathematical models Multimedia databases Nearest neighbor searches Object detection Probabilistic methods probabilistic ranking Probability distribution Probability theory Ranking Ratings & rankings Search engines Similarity similarity search Spatial databases State of the art Studies Temperature sensors Uncertain databases Uncertainty Vectors (mathematics)
title	Scalable Probabilistic Similarity Ranking in Uncertain Databases
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-07T02%3A38%3A18IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Scalable%20Probabilistic%20Similarity%20Ranking%20in%20Uncertain%20Databases&rft.jtitle=IEEE%20transactions%20on%20knowledge%20and%20data%20engineering&rft.au=Bernecker,%20Thomas&rft.date=2010-09-01&rft.volume=22&rft.issue=9&rft.spage=1234&rft.epage=1246&rft.pages=1234-1246&rft.issn=1041-4347&rft.eissn=1558-2191&rft.coden=ITKEEH&rft_id=info:doi/10.1109/TKDE.2010.78&rft_dat=%3Cproquest_RIE%3E2720445091%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1027778969&rft_id=info:pmid/&rft_ieee_id=5467070&rfr_iscdi=true