Prioritization of Retinal Disease Genes: An Integrative Approach

ABSTRACT The discovery of novel disease‐associated variations in genes is often a daunting task in highly heterogeneous disease classes. We seek a generalizable algorithm that integrates multiple publicly available genomic data sources in a machine‐learning model for the prioritization of candidates...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Human mutation 2013-06, Vol.34 (6), p.853-859
Hauptverfasser: Wagner, Alex H., Taylor, Kyle R., DeLuca, Adam P., Casavant, Thomas L., Mullins, Robert F., Stone, Edwin M., Scheetz, Todd E., Braun, Terry A.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:ABSTRACT The discovery of novel disease‐associated variations in genes is often a daunting task in highly heterogeneous disease classes. We seek a generalizable algorithm that integrates multiple publicly available genomic data sources in a machine‐learning model for the prioritization of candidates identified in patients with retinal disease. To approach this problem, we generate a set of feature vectors from publicly available microarray, RNA‐seq, and ChIP‐seq datasets of biological relevance to retinal disease, to observe patterns in gene expression specificity among tissues of the body and the eye, in addition to photoreceptor‐specific signals by the CRX transcription factor. Using these features, we describe a novel algorithm, positive and unlabeled learning for prioritization (PULP). This article compares several popular supervised learning techniques as the regression function for PULP. The results demonstrate a highly significant enrichment for previously characterized disease genes using a logistic regression method. Finally, a comparison of PULP with the popular gene prioritization tool ENDEAVOUR shows superior prioritization of retinal disease genes from previous studies. The java source code, compiled binary, assembled feature vectors, and instructions are available online at https://github.com/ahwagner/PULP. Using a diverse set of publicly available experiments, we construct a model that identifies patterns among known retinal disease causing genes. We use this model to prioritize variants identified in exome studies of patients diagnosed with Retinitis Pigmentosa (RP). We show that our model significantly enriches RP variant candidate lists and outperforms state‐of‐the‐art methods in generalized gene prediction strategies.
ISSN:1059-7794
1098-1004
DOI:10.1002/humu.22317