The partitioned-layer index: Answering monotone top- k queries using the convex skyline and partitioning-merging technique
A top- k query returns k tuples with the highest (or the lowest) scores from a relation. The score is computed by combining the values of one or more attributes. We focus on top- k queries having monotone linear score functions. Layer-based methods are well-known techniques for top- k query processi...
Gespeichert in:
Veröffentlicht in: | Information sciences 2009-09, Vol.179 (19), p.3286-3308 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | A top-
k query returns
k tuples with the highest (or the lowest) scores from a relation. The score is computed by combining the values of one or more attributes. We focus on top-
k queries having monotone linear score functions. Layer-based methods are well-known techniques for top-
k query processing. These methods construct a database as a single list of layers. Here, the
ith layer has the tuples that can be the top-
i tuple. Thus, these methods answer top-
k queries by reading at most
k layers. Query performance, however, is poor when the number of tuples in each layer (simply, the
layer size) is large. In this paper, we propose a new layer-ordering method, called the
Partitioned-Layer Index (simply, the
PL Index), that significantly improves query performance by reducing the layer size. The PL Index uses the notion of
partitioning, which constructs a database as multiple sublayer lists instead of a single layer list subsequently reducing the layer size. The PL Index also uses the
convex skyline, which is a subset of the skyline, to construct a sublayer to further reduce the layer size. The PL Index has the following desired properties. The query performance of the PL Index is quite insensitive to the weights of attributes (called the
preference vector) of the score function and is approximately linear in the value of
k. The PL Index is capable of tuning query performance for the most frequently used value of
k by controlling the number of sublayer lists. Experimental results using synthetic and real data sets show that the query performance of the PL Index significantly outperforms existing methods except for small values of
k (say,
k
⩽
9
). |
---|---|
ISSN: | 0020-0255 1872-6291 |
DOI: | 10.1016/j.ins.2009.05.016 |