Top-k nearest neighbor search in uncertain data series
Many real applications consume data that is intrinsically uncertain, noisy and error-prone. In this study, we investigate the problem of finding the top- k nearest neighbors in uncertain data series, which occur in several different domains. We formalize the top- k nearest neighbor problem for uncer...
Gespeichert in:
Veröffentlicht in: | Proceedings of the VLDB Endowment 2014-09, Vol.8 (1), p.13-24 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Many real applications consume data that is intrinsically uncertain, noisy and error-prone. In this study, we investigate the problem of finding the top-
k
nearest neighbors in uncertain data series, which occur in several different domains. We formalize the top-
k
nearest neighbor problem for uncertain data series, and describe a model for uncertain data series that captures both uncertainty and correlation. This distinguishes our approach from prior work that compromises the accuracy of the model by assuming independence of the value distribution at neighboring time-stamps. We introduce the
Holistic-PkNN
algorithm, which uses novel metric bounds for uncertain series and an efficient refinement strategy to reduce the overall number of required probability estimates. We evaluate our proposal under a variety of settings using a combination of synthetic and 45 real datasets from diverse domains. The results demonstrate the significant advantages of the proposed approach. |
---|---|
ISSN: | 2150-8097 2150-8097 |
DOI: | 10.14778/2735461.2735463 |