On Packing Very Large R-trees

Many emerging mobile applications require analyzing large spatial datasets. In these applications, efficient query processing relies on spatial access methods such as R-trees. For datasets that are fairly static, R-trees are often built as a data loading process using packing techniques. However, tr...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Haoyu Tan, Wuman Luo, Huajian Mao, Ni, Lionel M.
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Many emerging mobile applications require analyzing large spatial datasets. In these applications, efficient query processing relies on spatial access methods such as R-trees. For datasets that are fairly static, R-trees are often built as a data loading process using packing techniques. However, traditional R-tree packing algorithms can only run on a single machine and thereby cannot scale to very large datasets. In this paper, we design and implement a general framework for parallel Rtree packing using MapReduce. This framework sequentially packs each R-tree level from bottom up. For lower levels that have a large number of rectangles, we propose a partition based algorithm for parallel packing. We also discuss two spatial partitioning methods that can efficiently handle heavily skewed datasets. To evaluate the performance, we conducted extensive experiments using large real datasets. The size of the datasets is up to 100GB and the number of spatial objects is up to 2 billion. Besides range queries, k-nearest neighbor searches and spatial joins are also used for evaluation. To the best of our knowledge, it is the first work that evaluates the query performance of packed R-trees on such large datasets with spatial queries other than range queries. The results confirm the scalability of our proposed framework and parallel packing algorithms. It is also shown that our packed R-trees have good query performance and optimal space utilization.
ISSN:1551-6245
2375-0324
DOI:10.1109/MDM.2012.40