Phylogeny reconstruction based on the length distribution of k -mismatch common substrings

Various approaches to alignment-free sequence comparison are based on the length of exact or inexact word matches between pairs of input sequences. Haubold et al. (J Comput Biol 16:1487-1500, 2009) showed how the average number of substitutions per position between two DNA sequences can be estimated...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Algorithms for molecular biology 2017-12, Vol.12 (1), p.27-27, Article 27
Hauptverfasser:	Morgenstern, Burkhard, Schöbel, Svenja, Leimeister, Chris-André
Format:	Artikel
Sprache:	eng
Schlagworte:	Alignment-free Analysis Average common substring DNA DNA sequencing Kmacs Nucleotide sequencing Pattern matching Phylogeny
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Various approaches to alignment-free sequence comparison are based on the length of exact or inexact word matches between pairs of input sequences. Haubold et al. (J Comput Biol 16:1487-1500, 2009) showed how the average number of substitutions per position between two DNA sequences can be estimated based on the average length of exact common substrings. In this paper, we study the length distribution of -mismatch common substrings between two sequences. We show that the number of substitutions per position can be accurately estimated from the position of a local maximum in the length distribution of their -mismatch common substrings.
ISSN:	1748-7188 1748-7188
DOI:	10.1186/s13015-017-0118-8