DBM-Tree: Trading height-balancing for performance in metric access methods

Metric Access Methods (MAM) are employed to accelerate the processing of similarity queries, such as the range and the k-nearest neighbor queries. Current methods, such as the Slim-tree and the M-tree, improve the query performance minimizing the number of disk accesses, keeping a constant height of...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Journal of the Brazilian Computer Society 2005-10, Vol.11 (3), p.37-51
Hauptverfasser:	Vieira, Marcos R., Traina, Caetano, Chino, Fabio J. T., Traina, Agma J. M.
Format:	Artikel
Sprache:	eng
Schlagworte:	COMPUTER SCIENCE, INFORMATION SYSTEMS Trees
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Metric Access Methods (MAM) are employed to accelerate the processing of similarity queries, such as the range and the k-nearest neighbor queries. Current methods, such as the Slim-tree and the M-tree, improve the query performance minimizing the number of disk accesses, keeping a constant height of the structures stored on disks (height-balanced trees). However, the overlapping between their nodes has a very high influence on their performance. This paper presents a new dynamic MAM called the DBM -tree (Density-Based Metric tree), which can minimize the overlap between high-density nodes by relaxing the height-balancing of the structure. Thus, the height of the tree is larger in denser regions, in order to keep a tradeoff between breadth-searching and depth-searching. An underpinning for cost estimation on tree structures is their height, so we show a non-height dependable cost model that can be applied for DBM-tree. Moreover, an optimization algorithm called Shrink is also presented, which improves the performance of an already built DBM -tree by reorganizing the elements among their nodes. Experiments performed over both synthetic and real world datasets showed that the DBM -tree is, in average, 50% faster than traditional MAM and reduces the number of distance calculations by up to 72% and disk accesses by up to 66%. After performing the Shrink algorithm, the performance improves up to 40% regarding the number of disk accesses for range and k -nearest neighbor queries. In addition, the DBM -tree scales up well, exhibiting linear performance with growing number of elements in the database.
ISSN:	0104-6500 1678-4804 1678-4804
DOI:	10.1007/BF03192381