Statistical and computational methods for spatial and regulatory genomics

The complexity and scale of high-throughput genomics experiments has grown significantly over the past few years. Recently developed sequencing technologies now enable the profiling of transcriptomes at different 2D spatial locations across tissues; the reconstruction of protein- and RNA-associated...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
1. Verfasser:	Zou, Luli S
Format:	Dissertation
Sprache:	eng
Schlagworte:	Biology Biostatistics epigenomics Genetics genomics single cell transcriptomics
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The complexity and scale of high-throughput genomics experiments has grown significantly over the past few years. Recently developed sequencing technologies now enable the profiling of transcriptomes at different 2D spatial locations across tissues; the reconstruction of protein- and RNA-associated DNA folding patterns from 3D genome data; and the measurement of both the transcriptome and the epigenome from the same cell. While some biological questions of interest can be answered from these data using standard computational tools, others require methodological innovation for dealing with issues of high dimensionality and sparsity that come with the increased resolution of these new technologies. In this dissertation, I present three novel statistical and computational methods motivated in addressing fundamental biological questions for spatial and regulatory genomic data. Chapter 1 presents a method for detecting allele-specific expression in 2D spatial transcriptomics. Spatial transcriptomics data is highly sparse and challenging to analyze given that each observation can potentially contain mixtures of cell types. Our method uses a generalized linear model framework to account for cell type mixtures and detect spatial allele-specific expression within cell type. We demonstrate the utility of the method through simulations as well as Slide-seq data generated from the mouse hippocampus. The findings facilitated by our method provide new insight into the uncharacterized landscape of spatial and cell type-specific allele-specific expression in the mouse hippocampus. Chapter 2 introduces a method for deconvolving chromatin binding signal in 3D genome data, such as HiChIP or RD-SPRITE. In these data, interest lies in identifying the precise binding or interaction locations of proteins or RNAs with DNA, as well as estimating the strength of these associations. We use a probabilistic model where the observed DNA-DNA contacts directly convolve the true underlying interaction signal. We show in RD-SPRITE data that our method accurately deconvolves 1D lncRNA signal to more specific locations consistent with prior biological knowledge. We further show in HiChIP data that our method increases power for downstream analysis such as differential loop detection. Chapter 3 presents a method for estimating gene regulatory networks from paired single-cell RNA-sequencing and single-cell ATAC-sequencing. Our method uses a variational auto-encoder framework to jointly infer l