Confidence-Ranked Reconstruction of Census Microdata from Published Statistics
A reconstruction attack on a private dataset \(D\) takes as input some publicly accessible information about the dataset and produces a list of candidate elements of \(D\). We introduce a new class of data reconstruction attacks based on randomized methods for non-convex optimization. We empirically...
Gespeichert in:
Veröffentlicht in: | arXiv.org 2023-02 |
---|---|
Hauptverfasser: | , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | arXiv.org |
container_volume | |
creator | Dick, Travis Dwork, Cynthia Kearns, Michael Liu, Terrance Roth, Aaron Vietri, Giuseppe Wu, Zhiwei Steven |
description | A reconstruction attack on a private dataset \(D\) takes as input some publicly accessible information about the dataset and produces a list of candidate elements of \(D\). We introduce a new class of data reconstruction attacks based on randomized methods for non-convex optimization. We empirically demonstrate that our attacks can not only reconstruct full rows of \(D\) from aggregate query statistics \(Q(D)\in \mathbb{R}^m\), but can do so in a way that reliably ranks reconstructed rows by their odds of appearing in the private data, providing a signature that could be used for prioritizing reconstructed rows for further actions such as identify theft or hate crime. We also design a sequence of baselines for evaluating reconstruction attacks. Our attacks significantly outperform those that are based only on access to a public distribution or population from which the private dataset \(D\) was sampled, demonstrating that they are exploiting information in the aggregate statistics \(Q(D)\), and not simply the overall structure of the distribution. In other words, the queries \(Q(D)\) are permitting reconstruction of elements of this dataset, not the distribution from which \(D\) was drawn. These findings are established both on 2010 U.S. decennial Census data and queries and Census-derived American Community Survey datasets. Taken together, our methods and experiments illustrate the risks in releasing numerically precise aggregate statistics of a large dataset, and provide further motivation for the careful application of provably private techniques such as differential privacy. |
doi_str_mv | 10.48550/arxiv.2211.03128 |
format | Article |
fullrecord | <record><control><sourceid>proquest_arxiv</sourceid><recordid>TN_cdi_arxiv_primary_2211_03128</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2733855274</sourcerecordid><originalsourceid>FETCH-LOGICAL-a958-68bb0682237117afc4852a528b1e8a144734c7cdfe66bdb623906df240471ee93</originalsourceid><addsrcrecordid>eNotkE1PAyEYhImJiU3tD_DkJp53hRcW2KPZ-JXUj9TeNywLkdpCBdbov3dtPc1lZjLPIHRBcMVkXeNrFb_dVwVASIUpAXmCZkApKSUDOEOLlDYYY-AC6prO0HMbvHWD8dqUK-U_zFCsjA4-5Tjq7IIvgi1a49OYiienYxhUVoWNYVe8jv3Wpfcp8ZZVdik7nc7RqVXbZBb_Okfru9t1-1AuX-4f25tlqZpallz2PeYSgApChLJ6Wg6qBtkTIxVhTFCmhR6s4bwfeg60wXywwDATxJiGztHlsfYA2-2j26n40_1BdwfoyXF1dOxj-BxNyt0mjNFPmzoQlE5PgWD0F5NLWak</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2733855274</pqid></control><display><type>article</type><title>Confidence-Ranked Reconstruction of Census Microdata from Published Statistics</title><source>arXiv.org</source><source>Free E- Journals</source><creator>Dick, Travis ; Dwork, Cynthia ; Kearns, Michael ; Liu, Terrance ; Roth, Aaron ; Vietri, Giuseppe ; Wu, Zhiwei Steven</creator><creatorcontrib>Dick, Travis ; Dwork, Cynthia ; Kearns, Michael ; Liu, Terrance ; Roth, Aaron ; Vietri, Giuseppe ; Wu, Zhiwei Steven</creatorcontrib><description>A reconstruction attack on a private dataset \(D\) takes as input some publicly accessible information about the dataset and produces a list of candidate elements of \(D\). We introduce a new class of data reconstruction attacks based on randomized methods for non-convex optimization. We empirically demonstrate that our attacks can not only reconstruct full rows of \(D\) from aggregate query statistics \(Q(D)\in \mathbb{R}^m\), but can do so in a way that reliably ranks reconstructed rows by their odds of appearing in the private data, providing a signature that could be used for prioritizing reconstructed rows for further actions such as identify theft or hate crime. We also design a sequence of baselines for evaluating reconstruction attacks. Our attacks significantly outperform those that are based only on access to a public distribution or population from which the private dataset \(D\) was sampled, demonstrating that they are exploiting information in the aggregate statistics \(Q(D)\), and not simply the overall structure of the distribution. In other words, the queries \(Q(D)\) are permitting reconstruction of elements of this dataset, not the distribution from which \(D\) was drawn. These findings are established both on 2010 U.S. decennial Census data and queries and Census-derived American Community Survey datasets. Taken together, our methods and experiments illustrate the risks in releasing numerically precise aggregate statistics of a large dataset, and provide further motivation for the careful application of provably private techniques such as differential privacy.</description><identifier>EISSN: 2331-8422</identifier><identifier>DOI: 10.48550/arxiv.2211.03128</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Census ; Censuses ; Computer Science - Computers and Society ; Computer Science - Cryptography and Security ; Computer Science - Learning ; Convexity ; Crime ; Datasets ; Optimization ; Privacy ; Queries ; Reconstruction ; Statistics ; Theft</subject><ispartof>arXiv.org, 2023-02</ispartof><rights>2023. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,784,885,27923</link.rule.ids><backlink>$$Uhttps://doi.org/10.1073/pnas.2218605120$$DView published paper (Access to full text may be restricted)$$Hfree_for_read</backlink><backlink>$$Uhttps://doi.org/10.48550/arXiv.2211.03128$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Dick, Travis</creatorcontrib><creatorcontrib>Dwork, Cynthia</creatorcontrib><creatorcontrib>Kearns, Michael</creatorcontrib><creatorcontrib>Liu, Terrance</creatorcontrib><creatorcontrib>Roth, Aaron</creatorcontrib><creatorcontrib>Vietri, Giuseppe</creatorcontrib><creatorcontrib>Wu, Zhiwei Steven</creatorcontrib><title>Confidence-Ranked Reconstruction of Census Microdata from Published Statistics</title><title>arXiv.org</title><description>A reconstruction attack on a private dataset \(D\) takes as input some publicly accessible information about the dataset and produces a list of candidate elements of \(D\). We introduce a new class of data reconstruction attacks based on randomized methods for non-convex optimization. We empirically demonstrate that our attacks can not only reconstruct full rows of \(D\) from aggregate query statistics \(Q(D)\in \mathbb{R}^m\), but can do so in a way that reliably ranks reconstructed rows by their odds of appearing in the private data, providing a signature that could be used for prioritizing reconstructed rows for further actions such as identify theft or hate crime. We also design a sequence of baselines for evaluating reconstruction attacks. Our attacks significantly outperform those that are based only on access to a public distribution or population from which the private dataset \(D\) was sampled, demonstrating that they are exploiting information in the aggregate statistics \(Q(D)\), and not simply the overall structure of the distribution. In other words, the queries \(Q(D)\) are permitting reconstruction of elements of this dataset, not the distribution from which \(D\) was drawn. These findings are established both on 2010 U.S. decennial Census data and queries and Census-derived American Community Survey datasets. Taken together, our methods and experiments illustrate the risks in releasing numerically precise aggregate statistics of a large dataset, and provide further motivation for the careful application of provably private techniques such as differential privacy.</description><subject>Census</subject><subject>Censuses</subject><subject>Computer Science - Computers and Society</subject><subject>Computer Science - Cryptography and Security</subject><subject>Computer Science - Learning</subject><subject>Convexity</subject><subject>Crime</subject><subject>Datasets</subject><subject>Optimization</subject><subject>Privacy</subject><subject>Queries</subject><subject>Reconstruction</subject><subject>Statistics</subject><subject>Theft</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GOX</sourceid><recordid>eNotkE1PAyEYhImJiU3tD_DkJp53hRcW2KPZ-JXUj9TeNywLkdpCBdbov3dtPc1lZjLPIHRBcMVkXeNrFb_dVwVASIUpAXmCZkApKSUDOEOLlDYYY-AC6prO0HMbvHWD8dqUK-U_zFCsjA4-5Tjq7IIvgi1a49OYiienYxhUVoWNYVe8jv3Wpfcp8ZZVdik7nc7RqVXbZBb_Okfru9t1-1AuX-4f25tlqZpallz2PeYSgApChLJ6Wg6qBtkTIxVhTFCmhR6s4bwfeg60wXywwDATxJiGztHlsfYA2-2j26n40_1BdwfoyXF1dOxj-BxNyt0mjNFPmzoQlE5PgWD0F5NLWak</recordid><startdate>20230206</startdate><enddate>20230206</enddate><creator>Dick, Travis</creator><creator>Dwork, Cynthia</creator><creator>Kearns, Michael</creator><creator>Liu, Terrance</creator><creator>Roth, Aaron</creator><creator>Vietri, Giuseppe</creator><creator>Wu, Zhiwei Steven</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20230206</creationdate><title>Confidence-Ranked Reconstruction of Census Microdata from Published Statistics</title><author>Dick, Travis ; Dwork, Cynthia ; Kearns, Michael ; Liu, Terrance ; Roth, Aaron ; Vietri, Giuseppe ; Wu, Zhiwei Steven</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a958-68bb0682237117afc4852a528b1e8a144734c7cdfe66bdb623906df240471ee93</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Census</topic><topic>Censuses</topic><topic>Computer Science - Computers and Society</topic><topic>Computer Science - Cryptography and Security</topic><topic>Computer Science - Learning</topic><topic>Convexity</topic><topic>Crime</topic><topic>Datasets</topic><topic>Optimization</topic><topic>Privacy</topic><topic>Queries</topic><topic>Reconstruction</topic><topic>Statistics</topic><topic>Theft</topic><toplevel>online_resources</toplevel><creatorcontrib>Dick, Travis</creatorcontrib><creatorcontrib>Dwork, Cynthia</creatorcontrib><creatorcontrib>Kearns, Michael</creatorcontrib><creatorcontrib>Liu, Terrance</creatorcontrib><creatorcontrib>Roth, Aaron</creatorcontrib><creatorcontrib>Vietri, Giuseppe</creatorcontrib><creatorcontrib>Wu, Zhiwei Steven</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection><collection>arXiv Computer Science</collection><collection>arXiv.org</collection><jtitle>arXiv.org</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Dick, Travis</au><au>Dwork, Cynthia</au><au>Kearns, Michael</au><au>Liu, Terrance</au><au>Roth, Aaron</au><au>Vietri, Giuseppe</au><au>Wu, Zhiwei Steven</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Confidence-Ranked Reconstruction of Census Microdata from Published Statistics</atitle><jtitle>arXiv.org</jtitle><date>2023-02-06</date><risdate>2023</risdate><eissn>2331-8422</eissn><abstract>A reconstruction attack on a private dataset \(D\) takes as input some publicly accessible information about the dataset and produces a list of candidate elements of \(D\). We introduce a new class of data reconstruction attacks based on randomized methods for non-convex optimization. We empirically demonstrate that our attacks can not only reconstruct full rows of \(D\) from aggregate query statistics \(Q(D)\in \mathbb{R}^m\), but can do so in a way that reliably ranks reconstructed rows by their odds of appearing in the private data, providing a signature that could be used for prioritizing reconstructed rows for further actions such as identify theft or hate crime. We also design a sequence of baselines for evaluating reconstruction attacks. Our attacks significantly outperform those that are based only on access to a public distribution or population from which the private dataset \(D\) was sampled, demonstrating that they are exploiting information in the aggregate statistics \(Q(D)\), and not simply the overall structure of the distribution. In other words, the queries \(Q(D)\) are permitting reconstruction of elements of this dataset, not the distribution from which \(D\) was drawn. These findings are established both on 2010 U.S. decennial Census data and queries and Census-derived American Community Survey datasets. Taken together, our methods and experiments illustrate the risks in releasing numerically precise aggregate statistics of a large dataset, and provide further motivation for the careful application of provably private techniques such as differential privacy.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><doi>10.48550/arxiv.2211.03128</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | EISSN: 2331-8422 |
ispartof | arXiv.org, 2023-02 |
issn | 2331-8422 |
language | eng |
recordid | cdi_arxiv_primary_2211_03128 |
source | arXiv.org; Free E- Journals |
subjects | Census Censuses Computer Science - Computers and Society Computer Science - Cryptography and Security Computer Science - Learning Convexity Crime Datasets Optimization Privacy Queries Reconstruction Statistics Theft |
title | Confidence-Ranked Reconstruction of Census Microdata from Published Statistics |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-10T04%3A42%3A59IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_arxiv&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Confidence-Ranked%20Reconstruction%20of%20Census%20Microdata%20from%20Published%20Statistics&rft.jtitle=arXiv.org&rft.au=Dick,%20Travis&rft.date=2023-02-06&rft.eissn=2331-8422&rft_id=info:doi/10.48550/arxiv.2211.03128&rft_dat=%3Cproquest_arxiv%3E2733855274%3C/proquest_arxiv%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2733855274&rft_id=info:pmid/&rfr_iscdi=true |