CORE-Sketch: On Exact Computation of Median Absolute Deviation with Limited Space

Median absolute deviation (MAD), the median of the absolute deviations from the median, has been found useful in various applications such as outlier detection. Together with median, MAD is more robust to abnormal data than mean and standard deviation (SD). Unfortunately, existing methods return onl...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Proceedings of the VLDB Endowment 2023-07, Vol.16 (11), p.2832-2844
Hauptverfasser: Guan, Haoquan, Chen, Ziling, Song, Shaoxu
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Median absolute deviation (MAD), the median of the absolute deviations from the median, has been found useful in various applications such as outlier detection. Together with median, MAD is more robust to abnormal data than mean and standard deviation (SD). Unfortunately, existing methods return only approximate MAD that may be far from the exact one, and thus mislead the downstream applications. Computing exact MAD is costly, however, especially in space, by storing the entire dataset in memory. In this paper, we propose COnstruction-REfinement Sketch (CORE-Sketch) for computing exact MAD. The idea is to construct some sketch within limited space, and gradually refine the sketch to find the MAD element, i.e., the element with distance to the median exactly equal to MAD. Mergeability and convergence of the method is analyzed, ensuring the correctness of the proposal and enabling parallel computation. Extensive experiments demonstrate that CORE-Sketch achieves significantly less space occupation compared to the aforesaid baseline of No-Sketch, and has time and space costs relatively comparable to the DD-Sketch method for approximate MAD.
ISSN:2150-8097
2150-8097
DOI:10.14778/3611479.3611491