Approximate string matching using compressed suffix arrays

Let T be a text of length n and P be a pattern of length m , both strings over a fixed finite alphabet A . The k -difference ( k -mismatch, respectively) problem is to find all occurrences of P in T that have edit distance (Hamming distance, respectively) at most k from P . In this paper we investig...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Theoretical computer science 2006-03, Vol.352 (1), p.240-249
Hauptverfasser:	Huynh, Trinh N.D., Hon, Wing-Kai, Lam, Tak-Wah, Sung, Wing-Kin
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithmics. Computability. Computer arithmetics Applied sciences Computer science control theory systems Data processing. List processing. Character string processing Exact sciences and technology Memory organisation. Data processing Software Theoretical computing
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Let T be a text of length n and P be a pattern of length m , both strings over a fixed finite alphabet A . The k -difference ( k -mismatch, respectively) problem is to find all occurrences of P in T that have edit distance (Hamming distance, respectively) at most k from P . In this paper we investigate a well-studied case in which T is fixed and preprocessed into an indexing data structure so that any pattern query can be answered faster. We give a solution using an O ( n log n ) bits indexing data structure with O ( \| A \| k m k · max ( k , log n ) + occ ) query time, where occ is the number of occurrences. The best previous result requires O ( n log n ) bits indexing data structure and gives O ( \| A \| k m k + 2 + occ ) query time. Our solution also allows us to exploit compressed suffix arrays to reduce the indexing space to O ( n ) bits, while increasing the query time by an O ( log n ) factor only.
ISSN:	0304-3975 1879-2294
DOI:	10.1016/j.tcs.2005.11.022