Sparse prefix sums: Constant-time range sum queries over sparse multidimensional data cubes

Prefix sums are a powerful technique to answer range-sum queries over multi-dimensional arrays in O(1) time by looking up a constant number of values in an array of size O(N), where N is the number of cells in the multi-dimensional array. However, the technique suffers from O(N) update and storage c...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Information systems (Oxford) 2019-05, Vol.82, p.136-147
Hauptverfasser: Shekelyan, Michael, Dignös, Anton, Gamper, Johann
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Prefix sums are a powerful technique to answer range-sum queries over multi-dimensional arrays in O(1) time by looking up a constant number of values in an array of size O(N), where N is the number of cells in the multi-dimensional array. However, the technique suffers from O(N) update and storage costs. Relative prefix sums address the high update costs by partitioning the array into blocks, thereby breaking the dependency between cells. In this paper, we present sparse prefix sums that exploit data sparsity to reduce the high storage costs of relative prefix sums. By building upon relative prefix sums, sparse prefix sums achieve the same update complexity as relative prefix sums. The authors of relative prefix sums erroneously claimed that the update complexity is O(N) for any number of dimensions. We show that this claim holds only for two dimensions, whereas the correct complexity for an arbitrary number of d dimensions is O(Nd−1d). To reduce the storage costs, the sparse prefix sums technique exploits sparsity in the data and avoids to materialize prefix sums for empty rows and columns in the data grid; instead, look-up tables are used to preserve constant query time. Sparse prefix sums are the first approach to achieve O(1) query time with sub-linear storage costs for range-sum queries over sparse low-dimensional arrays. A thorough experimental evaluation shows that the approach works very well in practice. On the tested real-world data sets the storage costs are reduced by an order of magnitude with only a small overhead in query time, thus preserving microsecond-fast query answering. •Proof of correct update complexity for relative prefix sums.•Reduced storage costs of sparse prefix sums for more dimensions.•Experiments on real-world datasets up to seven dimensions.
ISSN:0306-4379
1873-6076
DOI:10.1016/j.is.2018.06.009