The Online Event-Detection Problem
Given a stream \(S = (s_1, s_2, ..., s_N)\), a \(\phi\)-heavy hitter is an item \(s_i\) that occurs at least \(\phi N\) times in \(S\). The problem of finding heavy-hitters has been extensively studied in the database literature. In this paper, we study a related problem. We say that there is a \(\p...
Gespeichert in:
Veröffentlicht in: | arXiv.org 2018-12 |
---|---|
Hauptverfasser: | , , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Given a stream \(S = (s_1, s_2, ..., s_N)\), a \(\phi\)-heavy hitter is an item \(s_i\) that occurs at least \(\phi N\) times in \(S\). The problem of finding heavy-hitters has been extensively studied in the database literature. In this paper, we study a related problem. We say that there is a \(\phi\)-event at time \(t\) if \(s_t\) occurs exactly \(\phi N\) times in \((s_1, s_2, ..., s_t)\). Thus, for each \(\phi\)-heavy hitter there is a single \(\phi\)-event which occurs when its count reaches the reporting threshold \(\phi N\). We define the online event-detection problem (OEDP) as: given \(\phi\) and a stream \(S\), report all \(\phi\)-events as soon as they occur. Many real-world monitoring systems demand event detection where all events must be reported (no false negatives), in a timely manner, with no non-events reported (no false positives), and a low reporting threshold. As a result, the OEDP requires a large amount of space (Omega(N) words) and is not solvable in the streaming model or via standard sampling-based approaches. Since OEDP requires large space, we focus on cache-efficient algorithms in the external-memory model. We provide algorithms for the OEDP that are within a log factor of optimal. Our algorithms are tunable: its parameters can be set to allow for a bounded false-positives and a bounded delay in reporting. None of our relaxations allow false negatives since reporting all events is a strict requirement of our applications. Finally, we show improved results when the count of items in the input stream follows a power-law distribution. |
---|---|
ISSN: | 2331-8422 |