Top-k nearest neighbor search in uncertain data series

Many real applications consume data that is intrinsically uncertain, noisy and error-prone. In this study, we investigate the problem of finding the top- k nearest neighbors in uncertain data series, which occur in several different domains. We formalize the top- k nearest neighbor problem for uncer...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Proceedings of the VLDB Endowment 2014-09, Vol.8 (1), p.13-24
Hauptverfasser: Dallachiesa, Michele, Palpanas, Themis, Ilyas, Ihab F.
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Many real applications consume data that is intrinsically uncertain, noisy and error-prone. In this study, we investigate the problem of finding the top- k nearest neighbors in uncertain data series, which occur in several different domains. We formalize the top- k nearest neighbor problem for uncertain data series, and describe a model for uncertain data series that captures both uncertainty and correlation. This distinguishes our approach from prior work that compromises the accuracy of the model by assuming independence of the value distribution at neighboring time-stamps. We introduce the Holistic-PkNN algorithm, which uses novel metric bounds for uncertain series and an efficient refinement strategy to reduce the overall number of required probability estimates. We evaluate our proposal under a variety of settings using a combination of synthetic and 45 real datasets from diverse domains. The results demonstrate the significant advantages of the proposed approach.
ISSN:2150-8097
2150-8097
DOI:10.14778/2735461.2735463