TVSBS: A fast exact pattern matching algorithm for biological sequences

The post-genomic era is witnessing a remarkable increase in the number of nucleotide and amino acid sequences. The content of biological sequence databases almost doubles frequently. Pattern matching emerges as a powerful tool in locating nucleotide or amino acid sequence patterns in the biological...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Current science (Bangalore) 2006-07, Vol.91 (1), p.47-53
Hauptverfasser: Thathoo, Rahul, Virmani, Ashish, Lakshmi, S. Sai, Balakrishnan, N., Sekar, K.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The post-genomic era is witnessing a remarkable increase in the number of nucleotide and amino acid sequences. The content of biological sequence databases almost doubles frequently. Pattern matching emerges as a powerful tool in locating nucleotide or amino acid sequence patterns in the biological sequence databases. Presently, several pattern-matching algorithms are available in the literature right from the basic Brute Force algorithm to the recent SSABS. The efficiency of the various algorithms depends on faster and exact identification of the pattern in the text. In this article, we propose an exact pattern-matching algorithm for biological sequences. The proposed algorithm, TVSBS, is a combination of Berry–Ravindran and SSABS algorithms. The performance of the new algorithm has been improved using the shift of Berry–Ravindran bad character table, which leads to lesser number of character comparisons. It works consistently well for both nucleotide and amino acid sequences. The proposed algorithm has been compared with the recent algorithm, SSABS. The results show the robustness of the proposed algorithm and thus it can be incorporated in any exact pattern-matching applications involving biological sequences. The best- and worst-case time complexities of the new algorithm are also outlined.
ISSN:0011-3891