Linearized Suffix Tree: an Efficient Index Data Structure with the Capabilities of Suffix Trees and Suffix Arrays
Suffix trees and suffix arrays are fundamental full-text index data structures to solve problems occurring in string processing. Since suffix trees and suffix arrays have different capabilities, some problems are solved more efficiently using suffix trees and others are solved more efficiently using...
Gespeichert in:
Veröffentlicht in: | Algorithmica 2008-11, Vol.52 (3), p.350-377 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Suffix trees and suffix arrays are fundamental full-text index data structures to solve problems occurring in string processing. Since suffix trees and suffix arrays have different capabilities, some problems are solved more efficiently using suffix trees and others are solved more efficiently using suffix arrays. We consider efficient index data structures with the capabilities of both suffix trees and suffix arrays without requiring much space. When the size of an alphabet is small, enhanced suffix arrays are such index data structures. However, when the size of an alphabet is large, enhanced suffix arrays lose the power of suffix trees. Pattern searching in an enhanced suffix array takes
O
(
m
|Σ|) time while pattern searching in a suffix tree takes
O
(
m
log |Σ|) time where
m
is the length of a pattern and Σ is an alphabet.
In this paper, we present
linearized suffix trees
which are efficient index data structures with the capabilities of both suffix trees and suffix arrays even when the size of an alphabet is large. A linearized suffix tree has all the functionalities of the enhanced suffix array and supports the pattern search in
O
(
m
log |Σ|) time. In a different point of view, it can be considered a practical implementation of the suffix tree supporting
O
(
m
log |Σ|)-time pattern search.
In addition, we also present two efficient algorithms for computing suffix links on the enhanced suffix array and the linearized suffix tree. These are the first algorithms that run in
O
(
n
) time without using the range minima query. Our experimental results show that our algorithms are faster than the previous algorithms. |
---|---|
ISSN: | 0178-4617 1432-0541 |
DOI: | 10.1007/s00453-007-9061-2 |