Compositional spectrum—revealing patterns for genomic sequence characterization and comparison

In this paper we propose a natural approach to characterizing genomic sequences, based on occurrences of fixed length words (strings over the alphabet { A, C, G, T}) from a sufficiently large set W of arbitrary (in general case) words. According to our approach, any genomic sequence can be character...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Physica A 2002-09, Vol.312 (3), p.447-457
Hauptverfasser: Kirzhner, Valery M., Korol, Abraham B., Bolshoy, Alexander, Nevo, Eviatar
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In this paper we propose a natural approach to characterizing genomic sequences, based on occurrences of fixed length words (strings over the alphabet { A, C, G, T}) from a sufficiently large set W of arbitrary (in general case) words. According to our approach, any genomic sequence can be characterized by a histogram of frequencies of imperfect matching of words from the set W that is called a compositional spectrum (CS). The specificity of CSs is manifest in a reasonable similarity of spectra obtained on different stretches of the same genome and, simultaneously, in a broad range of dissimilarities between spectral characteristics of different genomes. The proposed approach may have various applications in intra- and intergenomic sequence comparisons.
ISSN:0378-4371
1873-2119
DOI:10.1016/S0378-4371(02)00843-9