Data Sanitization to Reduce Private Information Leakage from Functional Genomics

The generation of functional genomics datasets is surging, because they provide insight into gene regulation and organismal phenotypes (e.g., genes upregulated in cancer). The intent behind functional genomics experiments is not necessarily to study genetic variants, yet they pose privacy concerns d...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Cell 2020-11, Vol.183 (4), p.905-917.e16
Hauptverfasser: Gürsoy, Gamze, Emani, Prashant, Brannon, Charlotte M., Jolanki, Otto A., Harmanci, Arif, Strattan, J. Seth, Cherry, J. Michael, Miranker, Andrew D., Gerstein, Mark
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 917.e16
container_issue 4
container_start_page 905
container_title Cell
container_volume 183
creator Gürsoy, Gamze
Emani, Prashant
Brannon, Charlotte M.
Jolanki, Otto A.
Harmanci, Arif
Strattan, J. Seth
Cherry, J. Michael
Miranker, Andrew D.
Gerstein, Mark
description The generation of functional genomics datasets is surging, because they provide insight into gene regulation and organismal phenotypes (e.g., genes upregulated in cancer). The intent behind functional genomics experiments is not necessarily to study genetic variants, yet they pose privacy concerns due to their use of next-generation sequencing. Moreover, there is a great incentive to broadly share raw reads for better statistical power and general research reproducibility. Thus, we need new modes of sharing beyond traditional controlled-access models. Here, we develop a data-sanitization procedure allowing raw functional genomics reads to be shared while minimizing privacy leakage, enabling principled privacy-utility trade-offs. Our protocol works with traditional Illumina-based assays and newer technologies such as 10x single-cell RNA sequencing. It involves quantifying the privacy leakage in reads by statistically linking study participants to known individuals. We carried out these linkages using data from highly accurate reference genomes and more realistic environmental samples. [Display omitted] •Surging functional genomics data necessitates improved data-sharing modes•Quantification of private information in these data is done via linkage attacks•A data sanitization protocol grounded in privacy and utility is developed•The sanitized format is compatible with existing file formats and pipelines Growing functional genomics data puts individual privacy at risk via linkage attacks, the risk of which is quantified and can be sanitized using a privacy-preserving data format.
doi_str_mv 10.1016/j.cell.2020.09.036
format Article
fullrecord <record><control><sourceid>pubmed_cross</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_7672785</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0092867420312332</els_id><sourcerecordid>33186529</sourcerecordid><originalsourceid>FETCH-LOGICAL-c521t-f7b70fef9eca0b0f25a974153a75dc977728c942636e64775dde0a6660335d593</originalsourceid><addsrcrecordid>eNp9kN1KxDAQhYMo7rr6Al5IX6B1kjZJAyKIP6uw4OLPdcimU83aNkvaXdCnt2VV9MargTPnnGE-Qo4pJBSoOF0mFqsqYcAgAZVAKnbImIKScUYl2yVjAMXiXMhsRA7adgkAOed8n4zSlOaCMzUm8yvTmejRNK5zH6Zzvok6Hz1gsbYYzYPbmA6ju6b0od5uZ2jezAtGZfB1dLNu7KCaKppi42tn20OyV5qqxaOvOSHPN9dPl7fx7H56d3kxiy1ntItLuZBQYqnQGlhAybhRMqM8NZIXVkkpWW5VxkQqUGSyFwsEI4SANOUFV-mEnG97V-tFjYXFpgum0qvgahPetTdO_9007lW_-I2WQjKZ876AbQts8G0bsPzJUtADX73UA1898NWgdM-3D538vvoT-QbaG862Bux_3zgMurUOG4uFC2g7XXj3X_8nlWeNrA</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Data Sanitization to Reduce Private Information Leakage from Functional Genomics</title><source>MEDLINE</source><source>Cell Press Free Archives</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><source>ScienceDirect Journals (5 years ago - present)</source><creator>Gürsoy, Gamze ; Emani, Prashant ; Brannon, Charlotte M. ; Jolanki, Otto A. ; Harmanci, Arif ; Strattan, J. Seth ; Cherry, J. Michael ; Miranker, Andrew D. ; Gerstein, Mark</creator><creatorcontrib>Gürsoy, Gamze ; Emani, Prashant ; Brannon, Charlotte M. ; Jolanki, Otto A. ; Harmanci, Arif ; Strattan, J. Seth ; Cherry, J. Michael ; Miranker, Andrew D. ; Gerstein, Mark</creatorcontrib><description>The generation of functional genomics datasets is surging, because they provide insight into gene regulation and organismal phenotypes (e.g., genes upregulated in cancer). The intent behind functional genomics experiments is not necessarily to study genetic variants, yet they pose privacy concerns due to their use of next-generation sequencing. Moreover, there is a great incentive to broadly share raw reads for better statistical power and general research reproducibility. Thus, we need new modes of sharing beyond traditional controlled-access models. Here, we develop a data-sanitization procedure allowing raw functional genomics reads to be shared while minimizing privacy leakage, enabling principled privacy-utility trade-offs. Our protocol works with traditional Illumina-based assays and newer technologies such as 10x single-cell RNA sequencing. It involves quantifying the privacy leakage in reads by statistically linking study participants to known individuals. We carried out these linkages using data from highly accurate reference genomes and more realistic environmental samples. [Display omitted] •Surging functional genomics data necessitates improved data-sharing modes•Quantification of private information in these data is done via linkage attacks•A data sanitization protocol grounded in privacy and utility is developed•The sanitized format is compatible with existing file formats and pipelines Growing functional genomics data puts individual privacy at risk via linkage attacks, the risk of which is quantified and can be sanitized using a privacy-preserving data format.</description><identifier>ISSN: 0092-8674</identifier><identifier>EISSN: 1097-4172</identifier><identifier>DOI: 10.1016/j.cell.2020.09.036</identifier><identifier>PMID: 33186529</identifier><language>eng</language><publisher>United States: Elsevier Inc</publisher><subject>Computer Security ; data sanitization ; functional genomics ; genome privacy ; Genome, Human ; Genomics ; Genotype ; High-Throughput Nucleotide Sequencing ; Humans ; linkage attacks ; Phenotype ; Phylogeny ; Privacy ; Reproducibility of Results ; RNA-seq ; Sequence Analysis, RNA ; Single-Cell Analysis ; surreptitious DNA sequencing</subject><ispartof>Cell, 2020-11, Vol.183 (4), p.905-917.e16</ispartof><rights>2020 Elsevier Inc.</rights><rights>Copyright © 2020 Elsevier Inc. All rights reserved.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c521t-f7b70fef9eca0b0f25a974153a75dc977728c942636e64775dde0a6660335d593</citedby><cites>FETCH-LOGICAL-c521t-f7b70fef9eca0b0f25a974153a75dc977728c942636e64775dde0a6660335d593</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://dx.doi.org/10.1016/j.cell.2020.09.036$$EHTML$$P50$$Gelsevier$$Hfree_for_read</linktohtml><link.rule.ids>230,314,780,784,885,3550,27924,27925,45995</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/33186529$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Gürsoy, Gamze</creatorcontrib><creatorcontrib>Emani, Prashant</creatorcontrib><creatorcontrib>Brannon, Charlotte M.</creatorcontrib><creatorcontrib>Jolanki, Otto A.</creatorcontrib><creatorcontrib>Harmanci, Arif</creatorcontrib><creatorcontrib>Strattan, J. Seth</creatorcontrib><creatorcontrib>Cherry, J. Michael</creatorcontrib><creatorcontrib>Miranker, Andrew D.</creatorcontrib><creatorcontrib>Gerstein, Mark</creatorcontrib><title>Data Sanitization to Reduce Private Information Leakage from Functional Genomics</title><title>Cell</title><addtitle>Cell</addtitle><description>The generation of functional genomics datasets is surging, because they provide insight into gene regulation and organismal phenotypes (e.g., genes upregulated in cancer). The intent behind functional genomics experiments is not necessarily to study genetic variants, yet they pose privacy concerns due to their use of next-generation sequencing. Moreover, there is a great incentive to broadly share raw reads for better statistical power and general research reproducibility. Thus, we need new modes of sharing beyond traditional controlled-access models. Here, we develop a data-sanitization procedure allowing raw functional genomics reads to be shared while minimizing privacy leakage, enabling principled privacy-utility trade-offs. Our protocol works with traditional Illumina-based assays and newer technologies such as 10x single-cell RNA sequencing. It involves quantifying the privacy leakage in reads by statistically linking study participants to known individuals. We carried out these linkages using data from highly accurate reference genomes and more realistic environmental samples. [Display omitted] •Surging functional genomics data necessitates improved data-sharing modes•Quantification of private information in these data is done via linkage attacks•A data sanitization protocol grounded in privacy and utility is developed•The sanitized format is compatible with existing file formats and pipelines Growing functional genomics data puts individual privacy at risk via linkage attacks, the risk of which is quantified and can be sanitized using a privacy-preserving data format.</description><subject>Computer Security</subject><subject>data sanitization</subject><subject>functional genomics</subject><subject>genome privacy</subject><subject>Genome, Human</subject><subject>Genomics</subject><subject>Genotype</subject><subject>High-Throughput Nucleotide Sequencing</subject><subject>Humans</subject><subject>linkage attacks</subject><subject>Phenotype</subject><subject>Phylogeny</subject><subject>Privacy</subject><subject>Reproducibility of Results</subject><subject>RNA-seq</subject><subject>Sequence Analysis, RNA</subject><subject>Single-Cell Analysis</subject><subject>surreptitious DNA sequencing</subject><issn>0092-8674</issn><issn>1097-4172</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNp9kN1KxDAQhYMo7rr6Al5IX6B1kjZJAyKIP6uw4OLPdcimU83aNkvaXdCnt2VV9MargTPnnGE-Qo4pJBSoOF0mFqsqYcAgAZVAKnbImIKScUYl2yVjAMXiXMhsRA7adgkAOed8n4zSlOaCMzUm8yvTmejRNK5zH6Zzvok6Hz1gsbYYzYPbmA6ju6b0od5uZ2jezAtGZfB1dLNu7KCaKppi42tn20OyV5qqxaOvOSHPN9dPl7fx7H56d3kxiy1ntItLuZBQYqnQGlhAybhRMqM8NZIXVkkpWW5VxkQqUGSyFwsEI4SANOUFV-mEnG97V-tFjYXFpgum0qvgahPetTdO_9007lW_-I2WQjKZ876AbQts8G0bsPzJUtADX73UA1898NWgdM-3D538vvoT-QbaG862Bux_3zgMurUOG4uFC2g7XXj3X_8nlWeNrA</recordid><startdate>20201112</startdate><enddate>20201112</enddate><creator>Gürsoy, Gamze</creator><creator>Emani, Prashant</creator><creator>Brannon, Charlotte M.</creator><creator>Jolanki, Otto A.</creator><creator>Harmanci, Arif</creator><creator>Strattan, J. Seth</creator><creator>Cherry, J. Michael</creator><creator>Miranker, Andrew D.</creator><creator>Gerstein, Mark</creator><general>Elsevier Inc</general><scope>6I.</scope><scope>AAFTH</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>5PM</scope></search><sort><creationdate>20201112</creationdate><title>Data Sanitization to Reduce Private Information Leakage from Functional Genomics</title><author>Gürsoy, Gamze ; Emani, Prashant ; Brannon, Charlotte M. ; Jolanki, Otto A. ; Harmanci, Arif ; Strattan, J. Seth ; Cherry, J. Michael ; Miranker, Andrew D. ; Gerstein, Mark</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c521t-f7b70fef9eca0b0f25a974153a75dc977728c942636e64775dde0a6660335d593</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Computer Security</topic><topic>data sanitization</topic><topic>functional genomics</topic><topic>genome privacy</topic><topic>Genome, Human</topic><topic>Genomics</topic><topic>Genotype</topic><topic>High-Throughput Nucleotide Sequencing</topic><topic>Humans</topic><topic>linkage attacks</topic><topic>Phenotype</topic><topic>Phylogeny</topic><topic>Privacy</topic><topic>Reproducibility of Results</topic><topic>RNA-seq</topic><topic>Sequence Analysis, RNA</topic><topic>Single-Cell Analysis</topic><topic>surreptitious DNA sequencing</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Gürsoy, Gamze</creatorcontrib><creatorcontrib>Emani, Prashant</creatorcontrib><creatorcontrib>Brannon, Charlotte M.</creatorcontrib><creatorcontrib>Jolanki, Otto A.</creatorcontrib><creatorcontrib>Harmanci, Arif</creatorcontrib><creatorcontrib>Strattan, J. Seth</creatorcontrib><creatorcontrib>Cherry, J. Michael</creatorcontrib><creatorcontrib>Miranker, Andrew D.</creatorcontrib><creatorcontrib>Gerstein, Mark</creatorcontrib><collection>ScienceDirect Open Access Titles</collection><collection>Elsevier:ScienceDirect:Open Access</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Cell</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Gürsoy, Gamze</au><au>Emani, Prashant</au><au>Brannon, Charlotte M.</au><au>Jolanki, Otto A.</au><au>Harmanci, Arif</au><au>Strattan, J. Seth</au><au>Cherry, J. Michael</au><au>Miranker, Andrew D.</au><au>Gerstein, Mark</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Data Sanitization to Reduce Private Information Leakage from Functional Genomics</atitle><jtitle>Cell</jtitle><addtitle>Cell</addtitle><date>2020-11-12</date><risdate>2020</risdate><volume>183</volume><issue>4</issue><spage>905</spage><epage>917.e16</epage><pages>905-917.e16</pages><issn>0092-8674</issn><eissn>1097-4172</eissn><abstract>The generation of functional genomics datasets is surging, because they provide insight into gene regulation and organismal phenotypes (e.g., genes upregulated in cancer). The intent behind functional genomics experiments is not necessarily to study genetic variants, yet they pose privacy concerns due to their use of next-generation sequencing. Moreover, there is a great incentive to broadly share raw reads for better statistical power and general research reproducibility. Thus, we need new modes of sharing beyond traditional controlled-access models. Here, we develop a data-sanitization procedure allowing raw functional genomics reads to be shared while minimizing privacy leakage, enabling principled privacy-utility trade-offs. Our protocol works with traditional Illumina-based assays and newer technologies such as 10x single-cell RNA sequencing. It involves quantifying the privacy leakage in reads by statistically linking study participants to known individuals. We carried out these linkages using data from highly accurate reference genomes and more realistic environmental samples. [Display omitted] •Surging functional genomics data necessitates improved data-sharing modes•Quantification of private information in these data is done via linkage attacks•A data sanitization protocol grounded in privacy and utility is developed•The sanitized format is compatible with existing file formats and pipelines Growing functional genomics data puts individual privacy at risk via linkage attacks, the risk of which is quantified and can be sanitized using a privacy-preserving data format.</abstract><cop>United States</cop><pub>Elsevier Inc</pub><pmid>33186529</pmid><doi>10.1016/j.cell.2020.09.036</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0092-8674
ispartof Cell, 2020-11, Vol.183 (4), p.905-917.e16
issn 0092-8674
1097-4172
language eng
recordid cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_7672785
source MEDLINE; Cell Press Free Archives; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals; ScienceDirect Journals (5 years ago - present)
subjects Computer Security
data sanitization
functional genomics
genome privacy
Genome, Human
Genomics
Genotype
High-Throughput Nucleotide Sequencing
Humans
linkage attacks
Phenotype
Phylogeny
Privacy
Reproducibility of Results
RNA-seq
Sequence Analysis, RNA
Single-Cell Analysis
surreptitious DNA sequencing
title Data Sanitization to Reduce Private Information Leakage from Functional Genomics
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-06T06%3A45%3A57IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-pubmed_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Data%20Sanitization%20to%20Reduce%20Private%20Information%20Leakage%20from%20Functional%20Genomics&rft.jtitle=Cell&rft.au=G%C3%BCrsoy,%20Gamze&rft.date=2020-11-12&rft.volume=183&rft.issue=4&rft.spage=905&rft.epage=917.e16&rft.pages=905-917.e16&rft.issn=0092-8674&rft.eissn=1097-4172&rft_id=info:doi/10.1016/j.cell.2020.09.036&rft_dat=%3Cpubmed_cross%3E33186529%3C/pubmed_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/33186529&rft_els_id=S0092867420312332&rfr_iscdi=true