Protecting Genomic Sequence Anonymity with Generalization Lattices

Objectives: Current genomic privacy technologies assume the identity of genomic sequence data is protected if personal information, such as demographics, are obscured, removed, or encrypted. While demographic features can directly compromise an individual’s identity, recent research demonstrates suc...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Methods of information in medicine 2005-01, Vol.44 (5), p.687-692
1. Verfasser:	Malin, B. A.
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms anonymity Base Sequence Databases, Nucleic Acid genetic variation genomic data Humans Original Article Privacy sequence analysis United States
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	692
container_issue	5
container_start_page	687
container_title	Methods of information in medicine
container_volume	44
creator	Malin, B. A.
description	Objectives: Current genomic privacy technologies assume the identity of genomic sequence data is protected if personal information, such as demographics, are obscured, removed, or encrypted. While demographic features can directly compromise an individual’s identity, recent research demonstrates such protections are insufficient because sequence data itself is susceptible to re-identification. To counteract this problem, we introduce an algorithm for anonymizing a collection of person-specific DNA sequences. Methods: The technique is termed DNA lattice anonymization (DNALA), and is based upon the formal privacy protection schema of k -anonymity. Under this model, it is impossible to observe or learn features that distinguish one genetic sequence from k -1 other entries in a collection. To maximize information retained in protected sequences, we incorporate a concept generalization lattice to learn the distance between two residues in a single nucleotide region. The lattice provides the most similar generalized concept for two residues (e.g. adenine and guanine are both purines). Results: The method is tested and evaluated with several publicly available human population datasets ranging in size from 30 to 400 sequences. Our findings imply the anonymization schema is feasible for the protection of sequences privacy. Conclusions: The DNALA method is the first computational disclosure control technique for general DNA sequences. Given the computational nature of the method, guarantees of anonymity can be formally proven. There is room for improvement and validation, though this research provides the groundwork from which future researchers can construct genomics anonymization schemas tailored to specific datasharing scenarios.
doi_str_mv	10.1055/s-0038-1634025
format	Article
fullrecord	<record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmed_primary_16400377</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>69057639</sourcerecordid><originalsourceid>FETCH-LOGICAL-c668t-2da9e01f45e55cdca28c70c03b36da19ad9a70bbc00e865e139a86cee43760823</originalsourceid><addsrcrecordid>eNrFkEGL1DAUx4so7uzq1aPMyVvXl6ZJ2-O66K4woKCCt0eavtlmaZOapJaZD-PBT2qGGdSLehQCD_J-78-fX5Y9Y3DJQIiXIQfgdc4kL6EQD7JVIRjLKxCfH2YrgELmrKjgLDsP4R4A6hrKx9kZk2U6q6pV9uq9d5F0NPZufUPWjUavP9CXmaym9ZV1djeauFsvJvaHPXk1mL2Kxtn1RsVoNIUn2aOtGgI9Pc2L7NOb1x-vb_PNu5u311ebXEtZx7zoVEPAtqUgIXSnVVHrCjTwlstOsUZ1jaqgbTUA1VIQ442qpSYqeSWhLvhF9uKYO3mXCoaIowmahkFZcnNA2YCoJG_-CRZQMODAEnh5BLV3IXja4uTNqPwOGeBBLwY86MWT3nTw_JQ8tyN1v_CTzwTkRyD2hkbCezd7m6T8OfD7kQ-6TzrVTP5naB_jhMuy4G-7jg5vVHdqbyzhTC35YHQfcU8mJtCbbSSLCvc4UuxdF1A7m75iQOV1b74mH8p2yndoQpgJw0TaqCGF2jlob6aIQpYcQ--W1GEcUslv_7ukrOVfCv4A47cFJg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>20210301</pqid></control><display><type>article</type><title>Protecting Genomic Sequence Anonymity with Generalization Lattices</title><source>MEDLINE</source><source>Thieme Connect Journals</source><creator>Malin, B. A.</creator><creatorcontrib>Malin, B. A.</creatorcontrib><description>Objectives: Current genomic privacy technologies assume the identity of genomic sequence data is protected if personal information, such as demographics, are obscured, removed, or encrypted. While demographic features can directly compromise an individual’s identity, recent research demonstrates such protections are insufficient because sequence data itself is susceptible to re-identification. To counteract this problem, we introduce an algorithm for anonymizing a collection of person-specific DNA sequences. Methods: The technique is termed DNA lattice anonymization (DNALA), and is based upon the formal privacy protection schema of k -anonymity. Under this model, it is impossible to observe or learn features that distinguish one genetic sequence from k -1 other entries in a collection. To maximize information retained in protected sequences, we incorporate a concept generalization lattice to learn the distance between two residues in a single nucleotide region. The lattice provides the most similar generalized concept for two residues (e.g. adenine and guanine are both purines). Results: The method is tested and evaluated with several publicly available human population datasets ranging in size from 30 to 400 sequences. Our findings imply the anonymization schema is feasible for the protection of sequences privacy. Conclusions: The DNALA method is the first computational disclosure control technique for general DNA sequences. Given the computational nature of the method, guarantees of anonymity can be formally proven. There is room for improvement and validation, though this research provides the groundwork from which future researchers can construct genomics anonymization schemas tailored to specific datasharing scenarios.</description><identifier>ISSN: 0026-1270</identifier><identifier>EISSN: 2511-705X</identifier><identifier>DOI: 10.1055/s-0038-1634025</identifier><identifier>PMID: 16400377</identifier><language>eng</language><publisher>Germany: Schattauer Verlag für Medizin und Naturwissenschaften</publisher><subject>Algorithms ; anonymity ; Base Sequence ; Databases, Nucleic Acid ; genetic variation ; genomic data ; Humans ; Original Article ; Privacy ; sequence analysis ; United States</subject><ispartof>Methods of information in medicine, 2005-01, Vol.44 (5), p.687-692</ispartof><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c668t-2da9e01f45e55cdca28c70c03b36da19ad9a70bbc00e865e139a86cee43760823</citedby></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.thieme-connect.de/products/ejournals/pdf/10.1055/s-0038-1634025.pdf$$EPDF$$P50$$Gthieme$$H</linktopdf><link.rule.ids>314,777,781,3004,3005,27905,27906,54540</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/16400377$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Malin, B. A.</creatorcontrib><title>Protecting Genomic Sequence Anonymity with Generalization Lattices</title><title>Methods of information in medicine</title><addtitle>Methods Inf Med</addtitle><description>Objectives: Current genomic privacy technologies assume the identity of genomic sequence data is protected if personal information, such as demographics, are obscured, removed, or encrypted. While demographic features can directly compromise an individual’s identity, recent research demonstrates such protections are insufficient because sequence data itself is susceptible to re-identification. To counteract this problem, we introduce an algorithm for anonymizing a collection of person-specific DNA sequences. Methods: The technique is termed DNA lattice anonymization (DNALA), and is based upon the formal privacy protection schema of k -anonymity. Under this model, it is impossible to observe or learn features that distinguish one genetic sequence from k -1 other entries in a collection. To maximize information retained in protected sequences, we incorporate a concept generalization lattice to learn the distance between two residues in a single nucleotide region. The lattice provides the most similar generalized concept for two residues (e.g. adenine and guanine are both purines). Results: The method is tested and evaluated with several publicly available human population datasets ranging in size from 30 to 400 sequences. Our findings imply the anonymization schema is feasible for the protection of sequences privacy. Conclusions: The DNALA method is the first computational disclosure control technique for general DNA sequences. Given the computational nature of the method, guarantees of anonymity can be formally proven. There is room for improvement and validation, though this research provides the groundwork from which future researchers can construct genomics anonymization schemas tailored to specific datasharing scenarios.</description><subject>Algorithms</subject><subject>anonymity</subject><subject>Base Sequence</subject><subject>Databases, Nucleic Acid</subject><subject>genetic variation</subject><subject>genomic data</subject><subject>Humans</subject><subject>Original Article</subject><subject>Privacy</subject><subject>sequence analysis</subject><subject>United States</subject><issn>0026-1270</issn><issn>2511-705X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2005</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNrFkEGL1DAUx4so7uzq1aPMyVvXl6ZJ2-O66K4woKCCt0eavtlmaZOapJaZD-PBT2qGGdSLehQCD_J-78-fX5Y9Y3DJQIiXIQfgdc4kL6EQD7JVIRjLKxCfH2YrgELmrKjgLDsP4R4A6hrKx9kZk2U6q6pV9uq9d5F0NPZufUPWjUavP9CXmaym9ZV1djeauFsvJvaHPXk1mL2Kxtn1RsVoNIUn2aOtGgI9Pc2L7NOb1x-vb_PNu5u311ebXEtZx7zoVEPAtqUgIXSnVVHrCjTwlstOsUZ1jaqgbTUA1VIQ442qpSYqeSWhLvhF9uKYO3mXCoaIowmahkFZcnNA2YCoJG_-CRZQMODAEnh5BLV3IXja4uTNqPwOGeBBLwY86MWT3nTw_JQ8tyN1v_CTzwTkRyD2hkbCezd7m6T8OfD7kQ-6TzrVTP5naB_jhMuy4G-7jg5vVHdqbyzhTC35YHQfcU8mJtCbbSSLCvc4UuxdF1A7m75iQOV1b74mH8p2yndoQpgJw0TaqCGF2jlob6aIQpYcQ--W1GEcUslv_7ukrOVfCv4A47cFJg</recordid><startdate>20050101</startdate><enddate>20050101</enddate><creator>Malin, B. A.</creator><general>Schattauer Verlag für Medizin und Naturwissenschaften</general><general>Schattauer GmbH</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QO</scope><scope>7TM</scope><scope>8FD</scope><scope>FR3</scope><scope>P64</scope><scope>RC3</scope><scope>7X8</scope></search><sort><creationdate>20050101</creationdate><title>Protecting Genomic Sequence Anonymity with Generalization Lattices</title><author>Malin, B. A.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c668t-2da9e01f45e55cdca28c70c03b36da19ad9a70bbc00e865e139a86cee43760823</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2005</creationdate><topic>Algorithms</topic><topic>anonymity</topic><topic>Base Sequence</topic><topic>Databases, Nucleic Acid</topic><topic>genetic variation</topic><topic>genomic data</topic><topic>Humans</topic><topic>Original Article</topic><topic>Privacy</topic><topic>sequence analysis</topic><topic>United States</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Malin, B. A.</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Biotechnology Research Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><jtitle>Methods of information in medicine</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Malin, B. A.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Protecting Genomic Sequence Anonymity with Generalization Lattices</atitle><jtitle>Methods of information in medicine</jtitle><addtitle>Methods Inf Med</addtitle><date>2005-01-01</date><risdate>2005</risdate><volume>44</volume><issue>5</issue><spage>687</spage><epage>692</epage><pages>687-692</pages><issn>0026-1270</issn><eissn>2511-705X</eissn><abstract>Objectives: Current genomic privacy technologies assume the identity of genomic sequence data is protected if personal information, such as demographics, are obscured, removed, or encrypted. While demographic features can directly compromise an individual’s identity, recent research demonstrates such protections are insufficient because sequence data itself is susceptible to re-identification. To counteract this problem, we introduce an algorithm for anonymizing a collection of person-specific DNA sequences. Methods: The technique is termed DNA lattice anonymization (DNALA), and is based upon the formal privacy protection schema of k -anonymity. Under this model, it is impossible to observe or learn features that distinguish one genetic sequence from k -1 other entries in a collection. To maximize information retained in protected sequences, we incorporate a concept generalization lattice to learn the distance between two residues in a single nucleotide region. The lattice provides the most similar generalized concept for two residues (e.g. adenine and guanine are both purines). Results: The method is tested and evaluated with several publicly available human population datasets ranging in size from 30 to 400 sequences. Our findings imply the anonymization schema is feasible for the protection of sequences privacy. Conclusions: The DNALA method is the first computational disclosure control technique for general DNA sequences. Given the computational nature of the method, guarantees of anonymity can be formally proven. There is room for improvement and validation, though this research provides the groundwork from which future researchers can construct genomics anonymization schemas tailored to specific datasharing scenarios.</abstract><cop>Germany</cop><pub>Schattauer Verlag für Medizin und Naturwissenschaften</pub><pmid>16400377</pmid><doi>10.1055/s-0038-1634025</doi><tpages>6</tpages></addata></record>
fulltext	fulltext
identifier	ISSN: 0026-1270
ispartof	Methods of information in medicine, 2005-01, Vol.44 (5), p.687-692
issn	0026-1270 2511-705X
language	eng
recordid	cdi_pubmed_primary_16400377
source	MEDLINE; Thieme Connect Journals
subjects	Algorithms anonymity Base Sequence Databases, Nucleic Acid genetic variation genomic data Humans Original Article Privacy sequence analysis United States
title	Protecting Genomic Sequence Anonymity with Generalization Lattices
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-19T16%3A30%3A18IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Protecting%20Genomic%20Sequence%20Anonymity%20with%20Generalization%20Lattices&rft.jtitle=Methods%20of%20information%20in%20medicine&rft.au=Malin,%20B.%20A.&rft.date=2005-01-01&rft.volume=44&rft.issue=5&rft.spage=687&rft.epage=692&rft.pages=687-692&rft.issn=0026-1270&rft.eissn=2511-705X&rft_id=info:doi/10.1055/s-0038-1634025&rft_dat=%3Cproquest_pubme%3E69057639%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=20210301&rft_id=info:pmid/16400377&rfr_iscdi=true