VA-Files vs. R-Trees in Distance Join Queries

In modern database applications the similarity of complex objects is examined by performing distance-based queries (e.g. nearest neighbour search) on data of high dimensionality. Most multidimensional indexing methods have failed to efficiently support these queries in arbitrary high-dimensional dat...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Corral, Antonio, D’Ermiliis, Alejandro, Manolopoulos, Yannis, Vassilakopoulos, Michael
Format:	Buchkapitel
Sprache:	eng
Schlagworte:	Applied sciences Close Pair Compact Approximation Computer science control theory systems Exact sciences and technology Information systems. Data bases Memory organisation. Data processing Page Access Similarity Join Software Vector File
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In modern database applications the similarity of complex objects is examined by performing distance-based queries (e.g. nearest neighbour search) on data of high dimensionality. Most multidimensional indexing methods have failed to efficiently support these queries in arbitrary high-dimensional datasets (due to the dimensionality curse). Similarity join queries and K closest pairs queries are the most representative distance join queries, where two high-dimensional datasets are combined. These queries are very expensive in terms of response time and I/O activity in case of high-dimensional spaces. On the other hand, the filtering-based approach, as applied by the VA-file, has turned out to be a very promising alternative for nearest neighbour search. In general, the filtering-based approach represents vectors as compact approximations, whereas by first scanning these approximations, only a small fraction of the real vectors is visited. Here, we elaborate on VA-files and develop VA-file based algorithms for answering similarity join and K closest pairs queries on high-dimensional data. Also, performance-wise we compare the use of VA-files and R*-trees (a structure that has been proven to be of robust nature) for answering these queries. The results of the comparison do not lead to a clear winner.
ISSN:	0302-9743 1611-3349
DOI:	10.1007/11547686_12