Sequence alignment as hypothesis testing

Sequence alignment depends on the scoring function that defines similarity between pairs of letters. For local alignment, the computational algorithm searches for the most similar segments in the sequences according to the scoring function. The choice of this scoring function is important for correc...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Journal of computational biology 2011-05, Vol.18 (5), p.677-691
Hauptverfasser:	Meng, Lu, Sun, Fengzhu, Zhang, Xuegong, Waterman, Michael S
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Alignment Amino acids Computation DNA sequencing Gene expression Gene targeting Likelihood Functions Mathematical analysis Mathematical models Mathematics Methods Models, Theoretical Nucleotide sequencing Physiological aspects Properties Scoring Segments Sequence Alignment Similarity Statistical distributions
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Sequence alignment depends on the scoring function that defines similarity between pairs of letters. For local alignment, the computational algorithm searches for the most similar segments in the sequences according to the scoring function. The choice of this scoring function is important for correctly detecting segments of interest. We formulate sequence alignment as a hypothesis testing problem, and conduct extensive simulation experiments to study the relationship between the scoring function and the distribution of aligned pairs within the aligned segment under this framework. We cut through the many ways to construct scoring functions and showed that any scoring function with negative expectation used in local alignment corresponds to a hypothesis test between the background distribution of sequence letters and a statistical distribution of letter pairs determined by the scoring function. The results indicate that the log-likelihood ratio scoring function is statistically most powerful and has the highest accuracy for detecting the segments of interest that are defined by the statistical distribution of aligned letter pairs.
ISSN:	1066-5277 1557-8666
DOI:	10.1089/cmb.2010.0328