A probabilistic measure for alignment-free sequence comparison

Motivation: Alignment-free sequence comparison methods are still in the early stages of development compared to those of alignment-based sequence analysis. In this paper, we introduce a probabilistic measure of similarity between two biological sequences without alignment. The method is based on the...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Bioinformatics 2004-12, Vol.20 (18), p.3455-3461
Hauptverfasser:	Pham, Tuan D., Zuegg, Johannes
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Biological and medical sciences Computer Simulation Escherichia coli Escherichia coli - genetics Fundamental and applied biological sciences. Psychology General aspects Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects) Models, Genetic Models, Statistical Operon - genetics Sequence Alignment - methods Sequence Analysis, DNA - methods Sequence Homology, Nucleic Acid Shigella flexneri Shigella flexneri - genetics Threonine - genetics
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Motivation: Alignment-free sequence comparison methods are still in the early stages of development compared to those of alignment-based sequence analysis. In this paper, we introduce a probabilistic measure of similarity between two biological sequences without alignment. The method is based on the concept of comparing the similarity/dissimilarity between two constructed Markov models. Results: The method was tested against six DNA sequences, which are the thrA, thrB and thrC genes of the threonine operons from Escherichia coli K-12 and from Shigella flexneri; and one random sequence having the same base composition as thrA from E.coli. These results were compared with those obtained from CLUSTAL W algorithm (alignment-based) and the chaos game representation (alignment-free). The method was further tested against a more complex set of 40 DNA sequences and compared with other existing sequence similarity measures (alignment-free). Availability: All datasets and computer codes written in MATLAB are available upon request from the first author.
ISSN:	1367-4803 1460-2059 1367-4811
DOI:	10.1093/bioinformatics/bth426