Re-identification Methods for Masked Microdata

Statistical agencies often mask (or distort) microdata in public-use files so that the confidentiality of information associated with individual entities is preserved. The intent of many of the masking methods is to cause only minor distortions in some of the distributions of the data and possibly n...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
1. Verfasser:	Winkler, William E.
Format:	Buchkapitel
Sprache:	eng
Schlagworte:	Applied sciences Computer science control theory systems Disclosure Risk Exact sciences and technology Individual Entity Information systems. Data bases Memory and file management (including protection and security) Memory organisation. Data processing Record Linkage Software Survey Research Method Synthetic Data
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	230
container_issue
container_start_page	216
container_title
container_volume	3050
creator	Winkler, William E.
description	Statistical agencies often mask (or distort) microdata in public-use files so that the confidentiality of information associated with individual entities is preserved. The intent of many of the masking methods is to cause only minor distortions in some of the distributions of the data and possibly no distortion in a few aggregate or marginal statistics In record linkage (as in nearest neighbor methods), metrics are used to determine how close a value of a variable in a record is from the value of the corresponding variable in another record. If a sufficient number of variables in one record have values that are close to values in another record, then the records may be a match and correspond to the same entity. This paper shows that it is possible to create metrics for which re- identification is straightforward in many situations where masking is currently done. We begin by demonstrating how to quickly construct metrics for continuous variables that have been micro-aggregated one at a time using conventional methods. We extend the methods to situations where rank swapping is performed and discuss the situation where several continuous variables are micro-aggregated simultaneously. We close by indicating how metrics might be created for situations of synthetic microdata satisfying several sets of analytic constraints.
doi_str_mv	10.1007/978-3-540-25955-8_17
format	Book Chapter
fullrecord	<record><control><sourceid>proquest_pasca</sourceid><recordid>TN_cdi_pascalfrancis_primary_15875906</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>EBC3087404_23_225</sourcerecordid><originalsourceid>FETCH-LOGICAL-p272t-4eb9ec36a9addb2bcb2a5cb7b4b5dff10fac9a2b98b5fc976d46f9aaca2415cc3</originalsourceid><addsrcrecordid>eNpFkEtLAzEQx-MTl9pv4GEvHlPz3CRHKb6gRRA9h0k2sWtrd03Wg9_etBacy8D_MTA_hK4omVFC1I1RGnMsBcFMGimxtlQdoWmReRH3mj5GFW0oxZwLc_LvMUo1OUUV4YRhowQ_R5UpEaVloy7QNOcPUoYxJbWs0Owl4K4N27GLnYex67f1Moyrvs117FO9hLwObb3sfOpbGOESnUXY5DA97Al6u797nT_ixfPD0_x2gQem2IhFcCZ43oCBtnXMecdAeqeccLKNkZII3gBzRjsZvVFNK5poADwwQaX3fIKu_-4OkD1sYoKt77IdUvcJ6cdSqZU0pCk59pfLxdq-h2Rd36-zpcTuQNrytuW2cLF7aHYHspT44Xjqv75DHm3YtXyhkGDjVzCMIWXLiVaCCMu4ZUzyX0VacVY</addsrcrecordid><sourcetype>Index Database</sourcetype><iscdi>true</iscdi><recordtype>book_chapter</recordtype><pqid>EBC3087404_23_225</pqid></control><display><type>book_chapter</type><title>Re-identification Methods for Masked Microdata</title><source>Springer Books</source><creator>Winkler, William E.</creator><contributor>Torra, Vicenc ; Domingo-Ferrer, Josep ; Domingo-Ferrer, Josep ; Torra, Vicenç</contributor><creatorcontrib>Winkler, William E. ; Torra, Vicenc ; Domingo-Ferrer, Josep ; Domingo-Ferrer, Josep ; Torra, Vicenç</creatorcontrib><description>Statistical agencies often mask (or distort) microdata in public-use files so that the confidentiality of information associated with individual entities is preserved. The intent of many of the masking methods is to cause only minor distortions in some of the distributions of the data and possibly no distortion in a few aggregate or marginal statistics In record linkage (as in nearest neighbor methods), metrics are used to determine how close a value of a variable in a record is from the value of the corresponding variable in another record. If a sufficient number of variables in one record have values that are close to values in another record, then the records may be a match and correspond to the same entity. This paper shows that it is possible to create metrics for which re- identification is straightforward in many situations where masking is currently done. We begin by demonstrating how to quickly construct metrics for continuous variables that have been micro-aggregated one at a time using conventional methods. We extend the methods to situations where rank swapping is performed and discuss the situation where several continuous variables are micro-aggregated simultaneously. We close by indicating how metrics might be created for situations of synthetic microdata satisfying several sets of analytic constraints.</description><identifier>ISSN: 0302-9743</identifier><identifier>ISBN: 9783540221180</identifier><identifier>ISBN: 3540221182</identifier><identifier>EISSN: 1611-3349</identifier><identifier>EISBN: 9783540259558</identifier><identifier>EISBN: 3540259554</identifier><identifier>DOI: 10.1007/978-3-540-25955-8_17</identifier><identifier>OCLC: 934978567</identifier><identifier>LCCallNum: QA76.9.D35</identifier><language>eng</language><publisher>Germany: Springer Berlin / Heidelberg</publisher><subject>Applied sciences ; Computer science; control theory; systems ; Disclosure Risk ; Exact sciences and technology ; Individual Entity ; Information systems. Data bases ; Memory and file management (including protection and security) ; Memory organisation. Data processing ; Record Linkage ; Software ; Survey Research Method ; Synthetic Data</subject><ispartof>Lecture notes in computer science, 2004, Vol.3050, p.216-230</ispartof><rights>Springer-Verlag Berlin Heidelberg 2004</rights><rights>2004 INIST-CNRS</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><relation>Lecture Notes in Computer Science</relation></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Uhttps://ebookcentral.proquest.com/covers/3087404-l.jpg</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/978-3-540-25955-8_17$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/978-3-540-25955-8_17$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>309,310,779,780,784,789,790,793,4050,4051,27925,38255,41442,42511</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=15875906$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><contributor>Torra, Vicenc</contributor><contributor>Domingo-Ferrer, Josep</contributor><contributor>Domingo-Ferrer, Josep</contributor><contributor>Torra, Vicenç</contributor><creatorcontrib>Winkler, William E.</creatorcontrib><title>Re-identification Methods for Masked Microdata</title><title>Lecture notes in computer science</title><description>Statistical agencies often mask (or distort) microdata in public-use files so that the confidentiality of information associated with individual entities is preserved. The intent of many of the masking methods is to cause only minor distortions in some of the distributions of the data and possibly no distortion in a few aggregate or marginal statistics In record linkage (as in nearest neighbor methods), metrics are used to determine how close a value of a variable in a record is from the value of the corresponding variable in another record. If a sufficient number of variables in one record have values that are close to values in another record, then the records may be a match and correspond to the same entity. This paper shows that it is possible to create metrics for which re- identification is straightforward in many situations where masking is currently done. We begin by demonstrating how to quickly construct metrics for continuous variables that have been micro-aggregated one at a time using conventional methods. We extend the methods to situations where rank swapping is performed and discuss the situation where several continuous variables are micro-aggregated simultaneously. We close by indicating how metrics might be created for situations of synthetic microdata satisfying several sets of analytic constraints.</description><subject>Applied sciences</subject><subject>Computer science; control theory; systems</subject><subject>Disclosure Risk</subject><subject>Exact sciences and technology</subject><subject>Individual Entity</subject><subject>Information systems. Data bases</subject><subject>Memory and file management (including protection and security)</subject><subject>Memory organisation. Data processing</subject><subject>Record Linkage</subject><subject>Software</subject><subject>Survey Research Method</subject><subject>Synthetic Data</subject><issn>0302-9743</issn><issn>1611-3349</issn><isbn>9783540221180</isbn><isbn>3540221182</isbn><isbn>9783540259558</isbn><isbn>3540259554</isbn><fulltext>true</fulltext><rsrctype>book_chapter</rsrctype><creationdate>2004</creationdate><recordtype>book_chapter</recordtype><recordid>eNpFkEtLAzEQx-MTl9pv4GEvHlPz3CRHKb6gRRA9h0k2sWtrd03Wg9_etBacy8D_MTA_hK4omVFC1I1RGnMsBcFMGimxtlQdoWmReRH3mj5GFW0oxZwLc_LvMUo1OUUV4YRhowQ_R5UpEaVloy7QNOcPUoYxJbWs0Owl4K4N27GLnYex67f1Moyrvs117FO9hLwObb3sfOpbGOESnUXY5DA97Al6u797nT_ixfPD0_x2gQem2IhFcCZ43oCBtnXMecdAeqeccLKNkZII3gBzRjsZvVFNK5poADwwQaX3fIKu_-4OkD1sYoKt77IdUvcJ6cdSqZU0pCk59pfLxdq-h2Rd36-zpcTuQNrytuW2cLF7aHYHspT44Xjqv75DHm3YtXyhkGDjVzCMIWXLiVaCCMu4ZUzyX0VacVY</recordid><startdate>2004</startdate><enddate>2004</enddate><creator>Winkler, William E.</creator><general>Springer Berlin / Heidelberg</general><general>Springer Berlin Heidelberg</general><general>Springer</general><scope>FFUUA</scope><scope>IQODW</scope></search><sort><creationdate>2004</creationdate><title>Re-identification Methods for Masked Microdata</title><author>Winkler, William E.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-p272t-4eb9ec36a9addb2bcb2a5cb7b4b5dff10fac9a2b98b5fc976d46f9aaca2415cc3</frbrgroupid><rsrctype>book_chapters</rsrctype><prefilter>book_chapters</prefilter><language>eng</language><creationdate>2004</creationdate><topic>Applied sciences</topic><topic>Computer science; control theory; systems</topic><topic>Disclosure Risk</topic><topic>Exact sciences and technology</topic><topic>Individual Entity</topic><topic>Information systems. Data bases</topic><topic>Memory and file management (including protection and security)</topic><topic>Memory organisation. Data processing</topic><topic>Record Linkage</topic><topic>Software</topic><topic>Survey Research Method</topic><topic>Synthetic Data</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Winkler, William E.</creatorcontrib><collection>ProQuest Ebook Central - Book Chapters - Demo use only</collection><collection>Pascal-Francis</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Winkler, William E.</au><au>Torra, Vicenc</au><au>Domingo-Ferrer, Josep</au><au>Domingo-Ferrer, Josep</au><au>Torra, Vicenç</au><format>book</format><genre>bookitem</genre><ristype>CHAP</ristype><atitle>Re-identification Methods for Masked Microdata</atitle><btitle>Lecture notes in computer science</btitle><seriestitle>Lecture Notes in Computer Science</seriestitle><date>2004</date><risdate>2004</risdate><volume>3050</volume><spage>216</spage><epage>230</epage><pages>216-230</pages><issn>0302-9743</issn><eissn>1611-3349</eissn><isbn>9783540221180</isbn><isbn>3540221182</isbn><eisbn>9783540259558</eisbn><eisbn>3540259554</eisbn><abstract>Statistical agencies often mask (or distort) microdata in public-use files so that the confidentiality of information associated with individual entities is preserved. The intent of many of the masking methods is to cause only minor distortions in some of the distributions of the data and possibly no distortion in a few aggregate or marginal statistics In record linkage (as in nearest neighbor methods), metrics are used to determine how close a value of a variable in a record is from the value of the corresponding variable in another record. If a sufficient number of variables in one record have values that are close to values in another record, then the records may be a match and correspond to the same entity. This paper shows that it is possible to create metrics for which re- identification is straightforward in many situations where masking is currently done. We begin by demonstrating how to quickly construct metrics for continuous variables that have been micro-aggregated one at a time using conventional methods. We extend the methods to situations where rank swapping is performed and discuss the situation where several continuous variables are micro-aggregated simultaneously. We close by indicating how metrics might be created for situations of synthetic microdata satisfying several sets of analytic constraints.</abstract><cop>Germany</cop><pub>Springer Berlin / Heidelberg</pub><doi>10.1007/978-3-540-25955-8_17</doi><oclcid>934978567</oclcid><tpages>15</tpages></addata></record>
fulltext	fulltext
identifier	ISSN: 0302-9743
ispartof	Lecture notes in computer science, 2004, Vol.3050, p.216-230
issn	0302-9743 1611-3349
language	eng
recordid	cdi_pascalfrancis_primary_15875906
source	Springer Books
subjects	Applied sciences Computer science control theory systems Disclosure Risk Exact sciences and technology Individual Entity Information systems. Data bases Memory and file management (including protection and security) Memory organisation. Data processing Record Linkage Software Survey Research Method Synthetic Data
title	Re-identification Methods for Masked Microdata
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-08T02%3A17%3A12IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pasca&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=bookitem&rft.atitle=Re-identification%20Methods%20for%20Masked%20Microdata&rft.btitle=Lecture%20notes%20in%20computer%20science&rft.au=Winkler,%20William%20E.&rft.date=2004&rft.volume=3050&rft.spage=216&rft.epage=230&rft.pages=216-230&rft.issn=0302-9743&rft.eissn=1611-3349&rft.isbn=9783540221180&rft.isbn_list=3540221182&rft_id=info:doi/10.1007/978-3-540-25955-8_17&rft_dat=%3Cproquest_pasca%3EEBC3087404_23_225%3C/proquest_pasca%3E%3Curl%3E%3C/url%3E&rft.eisbn=9783540259558&rft.eisbn_list=3540259554&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=EBC3087404_23_225&rft_id=info:pmid/&rfr_iscdi=true