From RNA to histological images: linking the transcriptome with human phenotypes through statistical learning

Tesi per compendi de publicacions Genomic datasets are fundamental to broaden our understanding of human biology in the context of health and disease. However, the high-dimensional nature of gene expression and other molecular traits poses a challenge when attempting to find associations of these da...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
1. Verfasser: Muñoz Aguirre, Manuel
Format: Dissertation
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Tesi per compendi de publicacions Genomic datasets are fundamental to broaden our understanding of human biology in the context of health and disease. However, the high-dimensional nature of gene expression and other molecular traits poses a challenge when attempting to find associations of these data types with human phenotypes. To this end, this thesis relies on statistical learning tools to mitigate the curse of dimensionality and link the human transcriptome with phenotypes at different orders of complexity: from RNA, to computationally-inferred cell type enrichments, and finishing with histological images and their corresponding free-text descriptions. We make four specific contributions. First, we built computational models based on gene expression of post-mortem human tissues in order to derive estimates of post mortem interval. Second, we redefined the basic histological types of tissue classification based on five broad transcriptional programs which define major cell types: epithelial, endothelial, mesenchymal, neural, and blood. We generated computational estimates for the enrichment of these major cell types and validated them through the analysis of histological images and free-text pathology reports, finding that departures from normal cellular enrichment correlate with disease-associated histological phenotypes. Third, we characterized the landscape of human sex-differential gene expression, finding that effects are small but ubiquitous and tend to be tissue-specific, with some of these genes being involved in biological and molecular functions related to disease and clinical phenotypes. Fourth, we proposed an in-silico methodology to spatially deconvolute gene expression from matched sample pairs of whole slide histological images and bulk RNA-seq gene expression, with the goal of replicating the spatial transcriptomics experimental technology. Within this study, we also developed a software tool to effortlessly process whole slide histological images into tiles for machine learning applications. Los conjuntos de datos genómicos son fundamentales para ampliar nuestra comprensión de la biología humana en el contexto de la salud y la enfermedad. Sin embargo, la alta dimensionalidad de la expresión génica y otros rasgos moleculares constituye un desafío para vincular estos tipos de datos con fenotipos humanos. Esta tesis se apoya en herramientas de aprendizaje estadístico para mitigar el problema de la dimensionalidad y vincular el transcripto