Spatial constrains and information content of sub-genomic regions of the human genome

Complexity metrics and machine learning (ML) models have been utilized to analyze the lengths of segmental genomic entities of DNA sequences (exonic, intronic, intergenic, repeat, unique) with the purpose to ask questions regarding the segmental organization of the human genome within the size distr...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:iScience 2021-02, Vol.24 (2), p.102048-102048, Article 102048
Hauptverfasser: Karakatsanis, Leonidas P., Pavlos, Evgenios G., Tsoulouhas, George, Stamokostas, Georgios L., Mosbruger, Timothy, Duke, Jamie L., Pavlos, George P., Monos, Dimitri S.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Complexity metrics and machine learning (ML) models have been utilized to analyze the lengths of segmental genomic entities of DNA sequences (exonic, intronic, intergenic, repeat, unique) with the purpose to ask questions regarding the segmental organization of the human genome within the size distribution of these sequences. For this we developed an integrated methodology that is based upon the reconstructed phase space theorem, the non-extensive statistical theory of Tsallis, ML techniques, and a technical index, integrating the generated information, which we introduce and named complexity factor (COFA). Our analysis revealed that the size distribution of the genomic regions within chromosomes are not random but follow patterns with characteristic features that have been seen through its complexity character, and it is part of the dynamics of the whole genome. Finally, this picture of dynamics in DNA is recognized using ML tools for clustering, classification, and prediction with high accuracy. [Display omitted] •The lengths of DNA subgenomic entities satisfied the Tsallis non-extensive statistics•The size distribution of the subgenomic entities within chromosomes follow specific patterns•A technical index COFA was introduced to characterize the degree of complexity•The degree of complexity behavior in DNA is identifiable using ML approaches Biocomputational Method; Bioinformatics; Biological Sciences; Genomic Analysis; Genomics; Statistical Physics; Techniques in Genetics
ISSN:2589-0042
2589-0042
DOI:10.1016/j.isci.2021.102048