Visualizing genomic data: The mixing perspective

We report on a novel way to visualize genomic data. By considering genome coding sequences, cds, as sets of the N=61 non-stop codons, one obtains a partition of the total number of codons in each cds. Partitions exhibit a statistical property known as mixing character which characterizes how mixed t...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:BioSystems 2023-02, Vol.224, p.104839-104839, Article 104839
Hauptverfasser: Seitz, William, Kirwan, A.D., Brčić-Kostić, Krunoslav, Mitrikeski, Petar Tomev, Seitz, P.K.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:We report on a novel way to visualize genomic data. By considering genome coding sequences, cds, as sets of the N=61 non-stop codons, one obtains a partition of the total number of codons in each cds. Partitions exhibit a statistical property known as mixing character which characterizes how mixed the partition is. Mixing characters have been shown mathematically to exhibit a partial order known as majorization (Ruch, 1975). In previous work (Seitz and Kirwan, 2022) we developed an approach that combined mixing and entropy that is visualized as a scatter plot. If we consider all 1,121,505 partitions of 61 codons, this produces a plot we call the theoretical mixing space, TGMS. A normalization procedure is developed here and applied to real genomic data to produce the genome mixing signature, GMS. Example GMS’s of 19 species, including Homo sapiens, are shown and discussed.
ISSN:0303-2647
1872-8324
DOI:10.1016/j.biosystems.2023.104839