Matching among multiple random sequences

In searching for strong homologies between multiple nucleic acid or protein sequences, researchers commonly look at fixed-length segments in common to the sequences. Such homologies form the foundation of segment-based algorithms for multiple alignment of protein sequences. The researcher uses setti...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Bulletin of mathematical biology 1997-05, Vol.59 (3), p.483-496
Hauptverfasser: Naus, J.I, Sheng, K
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In searching for strong homologies between multiple nucleic acid or protein sequences, researchers commonly look at fixed-length segments in common to the sequences. Such homologies form the foundation of segment-based algorithms for multiple alignment of protein sequences. The researcher uses settings of "unusualness of multiple matches" to calibrate the algorithms. In applications where a researcher has found a multiple matching word, statistical significance helps gauge the unusualness of the observed match. Previous approximations for the unusualness of multiple matches are based on large sample theory, and are sometimes quite inaccurate. Section 2 illustrates this inaccuracy, and provides accurate approximations for the probability of a common word in R out of R sequences. Section 3 generalizes the approximation to multiple matching in R out of S sequences. Section 4 describes a more complex approximation that incorporates exact probabilities and yields excellent accuracy; this approximation is useful for checking the simpler approximations over a range of values.
ISSN:0092-8240
1522-9602
DOI:10.1007/BF02459461