Compositional spectrum—revealing patterns for genomic sequence characterization and comparison
In this paper we propose a natural approach to characterizing genomic sequences, based on occurrences of fixed length words (strings over the alphabet { A, C, G, T}) from a sufficiently large set W of arbitrary (in general case) words. According to our approach, any genomic sequence can be character...
Gespeichert in:
Veröffentlicht in: | Physica A 2002-09, Vol.312 (3), p.447-457 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In this paper we propose a natural approach to characterizing genomic sequences, based on occurrences of fixed length words (strings over the alphabet {
A,
C,
G,
T}) from a sufficiently large set
W of arbitrary (in general case) words. According to our approach, any genomic sequence can be characterized by a histogram of frequencies of imperfect matching of words from the set
W that is called a compositional spectrum (CS). The specificity of CSs is manifest in a reasonable similarity of spectra obtained on different stretches of the same genome and, simultaneously, in a broad range of dissimilarities between spectral characteristics of different genomes. The proposed approach may have various applications in intra- and intergenomic sequence comparisons. |
---|---|
ISSN: | 0378-4371 1873-2119 |
DOI: | 10.1016/S0378-4371(02)00843-9 |