Worst-Case I/O-Efficient Skyline Algorithms

We consider the skyline problem (aka the maxima problem ), which has been extensively studied in the database community. The input is a set P of d -dimensional points. A point dominates another if the coordinate of the former is at most that of the latter on every dimension. The goal is to find the...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:ACM transactions on database systems 2012-12, Vol.37 (4), p.1-22
Hauptverfasser: Sheng, Cheng, Tao, Yufei
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:We consider the skyline problem (aka the maxima problem ), which has been extensively studied in the database community. The input is a set P of d -dimensional points. A point dominates another if the coordinate of the former is at most that of the latter on every dimension. The goal is to find the skyline , which is the set of points p  ∈  P such that p is not dominated by any other point in P . The main result of this article is that, for any fixed dimensionality d  ≥ 3, in external memory the skyline problem can be settled by performing O (( N / B )log M/B d−2 ( N / B )) I/Os in the worst case, where N is the cardinality of P, B the size of a disk block, and M the capacity of main memory. Similar bounds can also be achieved for computing several skyline variants, including the k-dominant skyline, k-skyband , and α-skyline . Furthermore, the performance can be improved if some dimensions of the data space have small domains. When the dimensionality d is not fixed, the challenge is to outperform the naive algorithm that simply checks all pairs of points in P × P . We give an algorithm that terminates in O (( N / B ) log d − 2 N ) I/Os, thus beating the naive solution for any d  =  O (log N / log log N ).
ISSN:0362-5915
1557-4644
DOI:10.1145/2389241.2389245