Data Sanitization to Reduce Private Information Leakage from Functional Genomics
The generation of functional genomics datasets is surging, because they provide insight into gene regulation and organismal phenotypes (e.g., genes upregulated in cancer). The intent behind functional genomics experiments is not necessarily to study genetic variants, yet they pose privacy concerns d...
Gespeichert in:
Veröffentlicht in: | Cell 2020-11, Vol.183 (4), p.905-917.e16 |
---|---|
Hauptverfasser: | , , , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The generation of functional genomics datasets is surging, because they provide insight into gene regulation and organismal phenotypes (e.g., genes upregulated in cancer). The intent behind functional genomics experiments is not necessarily to study genetic variants, yet they pose privacy concerns due to their use of next-generation sequencing. Moreover, there is a great incentive to broadly share raw reads for better statistical power and general research reproducibility. Thus, we need new modes of sharing beyond traditional controlled-access models. Here, we develop a data-sanitization procedure allowing raw functional genomics reads to be shared while minimizing privacy leakage, enabling principled privacy-utility trade-offs. Our protocol works with traditional Illumina-based assays and newer technologies such as 10x single-cell RNA sequencing. It involves quantifying the privacy leakage in reads by statistically linking study participants to known individuals. We carried out these linkages using data from highly accurate reference genomes and more realistic environmental samples.
[Display omitted]
•Surging functional genomics data necessitates improved data-sharing modes•Quantification of private information in these data is done via linkage attacks•A data sanitization protocol grounded in privacy and utility is developed•The sanitized format is compatible with existing file formats and pipelines
Growing functional genomics data puts individual privacy at risk via linkage attacks, the risk of which is quantified and can be sanitized using a privacy-preserving data format. |
---|---|
ISSN: | 0092-8674 1097-4172 |
DOI: | 10.1016/j.cell.2020.09.036 |