Approximate string matching using compressed suffix arrays
Let T be a text of length n and P be a pattern of length m , both strings over a fixed finite alphabet A . The k -difference ( k -mismatch, respectively) problem is to find all occurrences of P in T that have edit distance (Hamming distance, respectively) at most k from P . In this paper we investig...
Gespeichert in:
Veröffentlicht in: | Theoretical computer science 2006-03, Vol.352 (1), p.240-249 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Let
T
be a text of length
n
and
P
be a pattern of length
m
, both strings over a fixed finite alphabet
A
. The
k
-difference (
k
-mismatch, respectively) problem is to find all occurrences of
P
in
T
that have edit distance (Hamming distance, respectively) at most
k
from
P
. In this paper we investigate a well-studied case in which
T
is fixed and preprocessed into an indexing data structure so that any pattern query can be answered faster. We give a solution using an
O
(
n
log
n
)
bits indexing data structure with
O
(
|
A
|
k
m
k
·
max
(
k
,
log
n
)
+
occ
)
query time, where
occ
is the number of occurrences. The best previous result requires
O
(
n
log
n
)
bits indexing data structure and gives
O
(
|
A
|
k
m
k
+
2
+
occ
)
query time. Our solution also allows us to exploit compressed suffix arrays to reduce the indexing space to
O
(
n
)
bits, while increasing the query time by an
O
(
log
n
)
factor only. |
---|---|
ISSN: | 0304-3975 1879-2294 |
DOI: | 10.1016/j.tcs.2005.11.022 |