Statistical and Machine Learning Approaches for Family History Data
Germline mutations in many genes have been shown to increase the risk of developing cancer, and numerous statistical models have been developed to predict genetic susceptibility to cancer. Mendelian models predict risk by using family histories with estimated cancer penetrances (age- and sex-specifi...
Gespeichert in:
1. Verfasser: | |
---|---|
Format: | Dissertation |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Germline mutations in many genes have been shown to increase the risk of developing cancer, and numerous statistical models have been developed to predict genetic susceptibility to cancer. Mendelian models predict risk by using family histories with estimated cancer penetrances (age- and sex-specific risk of cancer given the genotype of the mutations) and mutation prevalences. This dissertation is focused on using statistical and machine learning tools to improve Mendelian risk prediction models, as well as exploring assumptions in these models.
Mendelian models assume conditional independence between families members' cancer ages given the genotype and sex. However, this assumption is often violated due to residual risk heterogeneity even after accounting for the mutations in the \linebreak model. In chapter 1, we aim to account for this heterogeneity by incorporating a frailty model that contains a family-specific frailty vector, impacting the cancer hazard function. We apply the proposed approach to directly improve breast cancer prediction in BRCAPRO, a Mendelian model that accounts for inherited mutations in the \textit{BRCA1} and \textit{BRCA2} genes to predict breast and ovarian cancer. We evaluate the proposed model's performance in simulations and real data from the Cancer Genetics Network and show improvements in model calibration and discrimination. We also discuss other approaches for incorporating frailties and their strengths and limitations.
In chapter 2, we continue to explore this assumption by determining the extent and sources of the heterogeneity across and within families. We quantify the heterogeneity by evaluating the ratio between the number of observed cancer cases in a family and the number of expected cases under a model where risk is assumed to be the same across families. We perform this analysis for both carriers and non-carriers in each family and visualize the results. We then introduce frailty models as a method to generatively mimic risk heterogeneity, and use synthetic data to explore the impact of various sources of the observed heterogeneity. We apply this approach to data on colorectal cancer in families carrying mutations in Lynch syndrome genes from Creighton University's Hereditary Cancer Center. We show that colorectal cancer risk in carriers can vary widely across families, and that this variation is not matched by a corresponding variation in the non-carriers from the same families. This suggests that the sources |
---|