Confidence-Ranked Reconstruction of Census Microdata from Published Statistics

A reconstruction attack on a private dataset \(D\) takes as input some publicly accessible information about the dataset and produces a list of candidate elements of \(D\). We introduce a new class of data reconstruction attacks based on randomized methods for non-convex optimization. We empirically...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:arXiv.org 2023-02
Hauptverfasser: Dick, Travis, Dwork, Cynthia, Kearns, Michael, Liu, Terrance, Roth, Aaron, Vietri, Giuseppe, Wu, Zhiwei Steven
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Dick, Travis
Dwork, Cynthia
Kearns, Michael
Liu, Terrance
Roth, Aaron
Vietri, Giuseppe
Wu, Zhiwei Steven
description A reconstruction attack on a private dataset \(D\) takes as input some publicly accessible information about the dataset and produces a list of candidate elements of \(D\). We introduce a new class of data reconstruction attacks based on randomized methods for non-convex optimization. We empirically demonstrate that our attacks can not only reconstruct full rows of \(D\) from aggregate query statistics \(Q(D)\in \mathbb{R}^m\), but can do so in a way that reliably ranks reconstructed rows by their odds of appearing in the private data, providing a signature that could be used for prioritizing reconstructed rows for further actions such as identify theft or hate crime. We also design a sequence of baselines for evaluating reconstruction attacks. Our attacks significantly outperform those that are based only on access to a public distribution or population from which the private dataset \(D\) was sampled, demonstrating that they are exploiting information in the aggregate statistics \(Q(D)\), and not simply the overall structure of the distribution. In other words, the queries \(Q(D)\) are permitting reconstruction of elements of this dataset, not the distribution from which \(D\) was drawn. These findings are established both on 2010 U.S. decennial Census data and queries and Census-derived American Community Survey datasets. Taken together, our methods and experiments illustrate the risks in releasing numerically precise aggregate statistics of a large dataset, and provide further motivation for the careful application of provably private techniques such as differential privacy.
doi_str_mv 10.48550/arxiv.2211.03128
format Article
fullrecord <record><control><sourceid>proquest_arxiv</sourceid><recordid>TN_cdi_arxiv_primary_2211_03128</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2733855274</sourcerecordid><originalsourceid>FETCH-LOGICAL-a958-68bb0682237117afc4852a528b1e8a144734c7cdfe66bdb623906df240471ee93</originalsourceid><addsrcrecordid>eNotkE1PAyEYhImJiU3tD_DkJp53hRcW2KPZ-JXUj9TeNywLkdpCBdbov3dtPc1lZjLPIHRBcMVkXeNrFb_dVwVASIUpAXmCZkApKSUDOEOLlDYYY-AC6prO0HMbvHWD8dqUK-U_zFCsjA4-5Tjq7IIvgi1a49OYiienYxhUVoWNYVe8jv3Wpfcp8ZZVdik7nc7RqVXbZBb_Okfru9t1-1AuX-4f25tlqZpallz2PeYSgApChLJ6Wg6qBtkTIxVhTFCmhR6s4bwfeg60wXywwDATxJiGztHlsfYA2-2j26n40_1BdwfoyXF1dOxj-BxNyt0mjNFPmzoQlE5PgWD0F5NLWak</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2733855274</pqid></control><display><type>article</type><title>Confidence-Ranked Reconstruction of Census Microdata from Published Statistics</title><source>arXiv.org</source><source>Free E- Journals</source><creator>Dick, Travis ; Dwork, Cynthia ; Kearns, Michael ; Liu, Terrance ; Roth, Aaron ; Vietri, Giuseppe ; Wu, Zhiwei Steven</creator><creatorcontrib>Dick, Travis ; Dwork, Cynthia ; Kearns, Michael ; Liu, Terrance ; Roth, Aaron ; Vietri, Giuseppe ; Wu, Zhiwei Steven</creatorcontrib><description>A reconstruction attack on a private dataset \(D\) takes as input some publicly accessible information about the dataset and produces a list of candidate elements of \(D\). We introduce a new class of data reconstruction attacks based on randomized methods for non-convex optimization. We empirically demonstrate that our attacks can not only reconstruct full rows of \(D\) from aggregate query statistics \(Q(D)\in \mathbb{R}^m\), but can do so in a way that reliably ranks reconstructed rows by their odds of appearing in the private data, providing a signature that could be used for prioritizing reconstructed rows for further actions such as identify theft or hate crime. We also design a sequence of baselines for evaluating reconstruction attacks. Our attacks significantly outperform those that are based only on access to a public distribution or population from which the private dataset \(D\) was sampled, demonstrating that they are exploiting information in the aggregate statistics \(Q(D)\), and not simply the overall structure of the distribution. In other words, the queries \(Q(D)\) are permitting reconstruction of elements of this dataset, not the distribution from which \(D\) was drawn. These findings are established both on 2010 U.S. decennial Census data and queries and Census-derived American Community Survey datasets. Taken together, our methods and experiments illustrate the risks in releasing numerically precise aggregate statistics of a large dataset, and provide further motivation for the careful application of provably private techniques such as differential privacy.</description><identifier>EISSN: 2331-8422</identifier><identifier>DOI: 10.48550/arxiv.2211.03128</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Census ; Censuses ; Computer Science - Computers and Society ; Computer Science - Cryptography and Security ; Computer Science - Learning ; Convexity ; Crime ; Datasets ; Optimization ; Privacy ; Queries ; Reconstruction ; Statistics ; Theft</subject><ispartof>arXiv.org, 2023-02</ispartof><rights>2023. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,784,885,27923</link.rule.ids><backlink>$$Uhttps://doi.org/10.1073/pnas.2218605120$$DView published paper (Access to full text may be restricted)$$Hfree_for_read</backlink><backlink>$$Uhttps://doi.org/10.48550/arXiv.2211.03128$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Dick, Travis</creatorcontrib><creatorcontrib>Dwork, Cynthia</creatorcontrib><creatorcontrib>Kearns, Michael</creatorcontrib><creatorcontrib>Liu, Terrance</creatorcontrib><creatorcontrib>Roth, Aaron</creatorcontrib><creatorcontrib>Vietri, Giuseppe</creatorcontrib><creatorcontrib>Wu, Zhiwei Steven</creatorcontrib><title>Confidence-Ranked Reconstruction of Census Microdata from Published Statistics</title><title>arXiv.org</title><description>A reconstruction attack on a private dataset \(D\) takes as input some publicly accessible information about the dataset and produces a list of candidate elements of \(D\). We introduce a new class of data reconstruction attacks based on randomized methods for non-convex optimization. We empirically demonstrate that our attacks can not only reconstruct full rows of \(D\) from aggregate query statistics \(Q(D)\in \mathbb{R}^m\), but can do so in a way that reliably ranks reconstructed rows by their odds of appearing in the private data, providing a signature that could be used for prioritizing reconstructed rows for further actions such as identify theft or hate crime. We also design a sequence of baselines for evaluating reconstruction attacks. Our attacks significantly outperform those that are based only on access to a public distribution or population from which the private dataset \(D\) was sampled, demonstrating that they are exploiting information in the aggregate statistics \(Q(D)\), and not simply the overall structure of the distribution. In other words, the queries \(Q(D)\) are permitting reconstruction of elements of this dataset, not the distribution from which \(D\) was drawn. These findings are established both on 2010 U.S. decennial Census data and queries and Census-derived American Community Survey datasets. Taken together, our methods and experiments illustrate the risks in releasing numerically precise aggregate statistics of a large dataset, and provide further motivation for the careful application of provably private techniques such as differential privacy.</description><subject>Census</subject><subject>Censuses</subject><subject>Computer Science - Computers and Society</subject><subject>Computer Science - Cryptography and Security</subject><subject>Computer Science - Learning</subject><subject>Convexity</subject><subject>Crime</subject><subject>Datasets</subject><subject>Optimization</subject><subject>Privacy</subject><subject>Queries</subject><subject>Reconstruction</subject><subject>Statistics</subject><subject>Theft</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GOX</sourceid><recordid>eNotkE1PAyEYhImJiU3tD_DkJp53hRcW2KPZ-JXUj9TeNywLkdpCBdbov3dtPc1lZjLPIHRBcMVkXeNrFb_dVwVASIUpAXmCZkApKSUDOEOLlDYYY-AC6prO0HMbvHWD8dqUK-U_zFCsjA4-5Tjq7IIvgi1a49OYiienYxhUVoWNYVe8jv3Wpfcp8ZZVdik7nc7RqVXbZBb_Okfru9t1-1AuX-4f25tlqZpallz2PeYSgApChLJ6Wg6qBtkTIxVhTFCmhR6s4bwfeg60wXywwDATxJiGztHlsfYA2-2j26n40_1BdwfoyXF1dOxj-BxNyt0mjNFPmzoQlE5PgWD0F5NLWak</recordid><startdate>20230206</startdate><enddate>20230206</enddate><creator>Dick, Travis</creator><creator>Dwork, Cynthia</creator><creator>Kearns, Michael</creator><creator>Liu, Terrance</creator><creator>Roth, Aaron</creator><creator>Vietri, Giuseppe</creator><creator>Wu, Zhiwei Steven</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20230206</creationdate><title>Confidence-Ranked Reconstruction of Census Microdata from Published Statistics</title><author>Dick, Travis ; Dwork, Cynthia ; Kearns, Michael ; Liu, Terrance ; Roth, Aaron ; Vietri, Giuseppe ; Wu, Zhiwei Steven</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a958-68bb0682237117afc4852a528b1e8a144734c7cdfe66bdb623906df240471ee93</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Census</topic><topic>Censuses</topic><topic>Computer Science - Computers and Society</topic><topic>Computer Science - Cryptography and Security</topic><topic>Computer Science - Learning</topic><topic>Convexity</topic><topic>Crime</topic><topic>Datasets</topic><topic>Optimization</topic><topic>Privacy</topic><topic>Queries</topic><topic>Reconstruction</topic><topic>Statistics</topic><topic>Theft</topic><toplevel>online_resources</toplevel><creatorcontrib>Dick, Travis</creatorcontrib><creatorcontrib>Dwork, Cynthia</creatorcontrib><creatorcontrib>Kearns, Michael</creatorcontrib><creatorcontrib>Liu, Terrance</creatorcontrib><creatorcontrib>Roth, Aaron</creatorcontrib><creatorcontrib>Vietri, Giuseppe</creatorcontrib><creatorcontrib>Wu, Zhiwei Steven</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection><collection>arXiv Computer Science</collection><collection>arXiv.org</collection><jtitle>arXiv.org</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Dick, Travis</au><au>Dwork, Cynthia</au><au>Kearns, Michael</au><au>Liu, Terrance</au><au>Roth, Aaron</au><au>Vietri, Giuseppe</au><au>Wu, Zhiwei Steven</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Confidence-Ranked Reconstruction of Census Microdata from Published Statistics</atitle><jtitle>arXiv.org</jtitle><date>2023-02-06</date><risdate>2023</risdate><eissn>2331-8422</eissn><abstract>A reconstruction attack on a private dataset \(D\) takes as input some publicly accessible information about the dataset and produces a list of candidate elements of \(D\). We introduce a new class of data reconstruction attacks based on randomized methods for non-convex optimization. We empirically demonstrate that our attacks can not only reconstruct full rows of \(D\) from aggregate query statistics \(Q(D)\in \mathbb{R}^m\), but can do so in a way that reliably ranks reconstructed rows by their odds of appearing in the private data, providing a signature that could be used for prioritizing reconstructed rows for further actions such as identify theft or hate crime. We also design a sequence of baselines for evaluating reconstruction attacks. Our attacks significantly outperform those that are based only on access to a public distribution or population from which the private dataset \(D\) was sampled, demonstrating that they are exploiting information in the aggregate statistics \(Q(D)\), and not simply the overall structure of the distribution. In other words, the queries \(Q(D)\) are permitting reconstruction of elements of this dataset, not the distribution from which \(D\) was drawn. These findings are established both on 2010 U.S. decennial Census data and queries and Census-derived American Community Survey datasets. Taken together, our methods and experiments illustrate the risks in releasing numerically precise aggregate statistics of a large dataset, and provide further motivation for the careful application of provably private techniques such as differential privacy.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><doi>10.48550/arxiv.2211.03128</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2023-02
issn 2331-8422
language eng
recordid cdi_arxiv_primary_2211_03128
source arXiv.org; Free E- Journals
subjects Census
Censuses
Computer Science - Computers and Society
Computer Science - Cryptography and Security
Computer Science - Learning
Convexity
Crime
Datasets
Optimization
Privacy
Queries
Reconstruction
Statistics
Theft
title Confidence-Ranked Reconstruction of Census Microdata from Published Statistics
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-10T04%3A42%3A59IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_arxiv&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Confidence-Ranked%20Reconstruction%20of%20Census%20Microdata%20from%20Published%20Statistics&rft.jtitle=arXiv.org&rft.au=Dick,%20Travis&rft.date=2023-02-06&rft.eissn=2331-8422&rft_id=info:doi/10.48550/arxiv.2211.03128&rft_dat=%3Cproquest_arxiv%3E2733855274%3C/proquest_arxiv%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2733855274&rft_id=info:pmid/&rfr_iscdi=true