The Online Event-Detection Problem

Given a stream \(S = (s_1, s_2, ..., s_N)\), a \(\phi\)-heavy hitter is an item \(s_i\) that occurs at least \(\phi N\) times in \(S\). The problem of finding heavy-hitters has been extensively studied in the database literature. In this paper, we study a related problem. We say that there is a \(\p...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:arXiv.org 2018-12
Hauptverfasser: Bender, Michael A, Berry, Jonathan W, Farach-Colton, Martin, Johnson, Rob, Kroeger, Thomas M, Pandey, Prashant, Phillips, Cynthia A, Singh, Shikha
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Given a stream \(S = (s_1, s_2, ..., s_N)\), a \(\phi\)-heavy hitter is an item \(s_i\) that occurs at least \(\phi N\) times in \(S\). The problem of finding heavy-hitters has been extensively studied in the database literature. In this paper, we study a related problem. We say that there is a \(\phi\)-event at time \(t\) if \(s_t\) occurs exactly \(\phi N\) times in \((s_1, s_2, ..., s_t)\). Thus, for each \(\phi\)-heavy hitter there is a single \(\phi\)-event which occurs when its count reaches the reporting threshold \(\phi N\). We define the online event-detection problem (OEDP) as: given \(\phi\) and a stream \(S\), report all \(\phi\)-events as soon as they occur. Many real-world monitoring systems demand event detection where all events must be reported (no false negatives), in a timely manner, with no non-events reported (no false positives), and a low reporting threshold. As a result, the OEDP requires a large amount of space (Omega(N) words) and is not solvable in the streaming model or via standard sampling-based approaches. Since OEDP requires large space, we focus on cache-efficient algorithms in the external-memory model. We provide algorithms for the OEDP that are within a log factor of optimal. Our algorithms are tunable: its parameters can be set to allow for a bounded false-positives and a bounded delay in reporting. None of our relaxations allow false negatives since reporting all events is a strict requirement of our applications. Finally, we show improved results when the count of items in the input stream follows a power-law distribution.
ISSN:2331-8422