DBM-Tree: Trading height-balancing for performance in metric access methods
Metric Access Methods (MAM) are employed to accelerate the processing of similarity queries, such as the range and the k-nearest neighbor queries. Current methods, such as the Slim-tree and the M-tree, improve the query performance minimizing the number of disk accesses, keeping a constant height of...
Gespeichert in:
Veröffentlicht in: | Journal of the Brazilian Computer Society 2005-10, Vol.11 (3), p.37-51 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Metric Access Methods (MAM) are employed to accelerate the processing of similarity queries, such as the range and the k-nearest neighbor queries. Current methods, such as the Slim-tree and the M-tree, improve the query performance minimizing the number of disk accesses, keeping a constant height of the structures stored on disks (height-balanced trees). However, the overlapping between their nodes has a very high influence on their performance. This paper presents a new dynamic MAM called the
DBM
-tree (Density-Based Metric tree), which can minimize the overlap between high-density nodes by relaxing the height-balancing of the structure. Thus, the height of the tree is larger in denser regions, in order to keep a tradeoff between breadth-searching and depth-searching. An underpinning for cost estimation on tree structures is their height, so we show a non-height dependable cost model that can be applied for DBM-tree. Moreover, an optimization algorithm called
Shrink
is also presented, which improves the performance of an already built
DBM
-tree by reorganizing the elements among their nodes. Experiments performed over both synthetic and real world datasets showed that the
DBM
-tree is, in average, 50% faster than traditional MAM and reduces the number of distance calculations by up to 72% and disk accesses by up to 66%. After performing the Shrink algorithm, the performance improves up to 40% regarding the number of disk accesses for range and
k
-nearest neighbor queries. In addition, the
DBM
-tree scales up well, exhibiting linear performance with growing number of elements in the database. |
---|---|
ISSN: | 0104-6500 1678-4804 1678-4804 |
DOI: | 10.1007/BF03192381 |