Optimal and efficient generalized twig pattern processing: a combination of preorder and postorder filterings
Searching for occurrences of a twig pattern query (TPQ) in an XML document is a core task of all XML database query languages. The generalized twig pattern (GTP) extends the TPQ model to include semantics related to output nodes, optional nodes, and boolean expressions which are part of the XQuery l...
Gespeichert in:
Veröffentlicht in: | The VLDB journal 2013-06, Vol.22 (3), p.369-393 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Searching for occurrences of a twig pattern query (TPQ) in an XML document is a core task of all XML database query languages. The generalized twig pattern (GTP) extends the TPQ model to include semantics related to output nodes, optional nodes, and boolean expressions which are part of the XQuery language.
Preorder filtering
holistic algorithms such as TwigStack represent a significant class of TPQ processing approaches with a linear worst-case I/O complexity with respect to the sum of the input and output sizes for some query classes. Another important class of holistic approaches is represented by
postorder filtering
holistic algorithms such as
Stack which introduced a linear output enumeration time with respect to the result size. In this article, we introduce a holistic algorithm called GTPStack which is the first approach capable of processing a GTP with a linear worst-case I/O complexity with respect to the GTP result size. This is achieved by using a combination of the preorder and postorder filterings before storing nodes in an intermediate storage. Additionally, another contribution of this article is an introduction of a new perspective of holistic algorithm optimality. We show that the optimality depends not only on a query class but also on XML document characteristics. This new view on the optimality extends the general knowledge about the type of queries for which the holistic algorithms are optimal. Moreover, it allows us to determine that GTPStack is optimal for any GTP when a specific XML document is considered. We present a comprehensive experimental study of the state-of-the-art holistic algorithms showing under which conditions GTPStack outperforms the other holistic approaches. |
---|---|
ISSN: | 1066-8888 0949-877X |
DOI: | 10.1007/s00778-012-0295-5 |