Enabling Privacy-Preserving GWASs in Heterogeneous Human Populations

The proliferation of large genomic databases offers the potential to perform increasingly larger-scale genome-wide association studies (GWASs). Due to privacy concerns, however, access to these data is limited, greatly reducing their usefulness for research. Here, we introduce a computational framew...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Cell systems 2016-07, Vol.3 (1), p.54-61
Hauptverfasser: Simmons, Sean, Sahinalp, Cenk, Berger, Bonnie
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The proliferation of large genomic databases offers the potential to perform increasingly larger-scale genome-wide association studies (GWASs). Due to privacy concerns, however, access to these data is limited, greatly reducing their usefulness for research. Here, we introduce a computational framework for performing GWASs that adapts principles of differential privacy—a cryptographic theory that facilitates secure analysis of sensitive data—to both protect private phenotype information (e.g., disease status) and correct for population stratification. This framework enables us to produce privacy-preserving GWAS results based on EIGENSTRAT and linear mixed model (LMM)-based statistics, both of which correct for population stratification. We test our differentially private statistics, PrivSTRAT and PrivLMM, on simulated and real GWAS datasets and find they are able to protect privacy while returning meaningful results. Our framework can be used to securely query private genomic datasets to discover which specific genomic alterations may be associated with a disease, thus increasing the availability of these valuable datasets. [Display omitted] •We introduce a novel variant of differential privacy tailored to genomic databases•We enable privacy-preserving GWASs in the presence of population stratification•We implement and test these algorithms on numerous real and synthetic datasets Simmons et al. introduce a scalable framework for allowing privacy-preserving queries on genomic databases using cutting-edge genome-wide association study statistics that account for population stratification.
ISSN:2405-4712
2405-4720
DOI:10.1016/j.cels.2016.04.013