Computational Modelling of Gene Regulation in Cancer : Coding the noncoding genome

Technological advancements have enabled quantification of processes within and around us. The information stored within our body converts into petabytes of data. Processing and learning from such data requires comprehensive computational programs and software systems. We developed software programs...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
1. Verfasser: Umer, Husen Muhammad
Format: Dissertation
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Technological advancements have enabled quantification of processes within and around us. The information stored within our body converts into petabytes of data. Processing and learning from such data requires comprehensive computational programs and software systems. We developed software programs to systematically investigate the process of gene regulation in the human genome. Gene regulation is a complex process where several genomic elements control expression of a gene through recruiting many transcription factor (TF) proteins. The TFs recognize specific DNA sequences known as motifs. DNA mutations in regulatory elements and particularly in TF motifs may cause gene deregulation. Therefore, defining the landscape of regulatory elements and their roles in cancer and complex diseases is of major importance. We developed an algorithm (tfNet) to identify regulatory elements based on transcription factor binding sites. tfNet identified nearly 144,000 regulatory elements in five human cell lines. Investigating the elements we identified TF interaction networks and enrichment of many GWAS SNPs. We also defined the regulatory landscape for other conditions and species. Next, we investigated the role of regulatory elements in cancer. Cancer is initiated and developed by genetic aberrations in the genome. Genetic changes that are present in a cancer genome are obtained through whole genome sequencing technologies. We analyzed somatic mutations that had been detected in 326 whole genomes of liver cancer patients. Our results indicated 907 candidate mutations affecting TF motifs. Genome wide alignment of the mutated motifs revealed a significant enrichment of mutations in a highly conserved position of the CTCF motif. Gene expression analysis exhibited disruption of topologically associated domains in the mutated samples. We also confirmed the mutational pattern in pancreatic, gastric and esophagus cancers. Finally, enrichment of cancer associated gene sets and pathways suggested great role of noncoding mutations in cancer. To systematically analyze DNA mutations in TF motifs, we developed an online database system (funMotifs). Publicly available datasets were collected for thousands experiments. The datasets were integrated using a logistic regression model. Functionality annotations and scores for motifs of 519 TFs were derived. The database allows for identification of variants affecting functional motifs in a selected tissue type. Finally, a comprehensive anal