Optimization of signature file parameters for databases with varying record lengths

For signature files we propose a new false drop estimation method for databases with varying record lengths. Our approach provides more accurate estimation of the number of false drops by considering the lengths of individual records instead of using the average number of terms per record. In signat...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Computer journal 1999, Vol.42 (1), p.11-23
Hauptverfasser: KOCBERBER, S, CAN, F, PATTON, J. M
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:For signature files we propose a new false drop estimation method for databases with varying record lengths. Our approach provides more accurate estimation of the number of false drops by considering the lengths of individual records instead of using the average number of terms per record. In signature file processing, accurate estimation of the number of false drops is essential to obtain a more accurate signature file and therefore to obtain a better (query) response time. With a formal proof we show that under certain conditions the number of false drops estimated by considering the average record length is less than or equal to the precise 'expected' estimation which is based on the individual record lengths. The experiments with real data show that the proposed method accurately estimates the number of false drops and the actual response time. Depending on the space overhead, our approach obtains up to 33% and 20% response time improvements for the conventional sequential and new efficient multiframe signature file methods, respectively.
ISSN:0010-4620
1460-2067
DOI:10.1093/comjnl/42.1.11