PrivKVM: Revisiting Key-Value Statistics Estimation With Local Differential Privacy

A key factor in big data analytics and artificial intelligence is the collection of user data from a large population. However, the collection of user data comes at the price of privacy risks, not only for users but also for businesses who are vulnerable to internal and external data breaches. To ad...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on dependable and secure computing 2023-01, Vol.20 (1), p.17-35
Hauptverfasser: Ye, Qingqing, Hu, Haibo, Meng, Xiaofeng, Zheng, Huadi, Huang, Kai, Fang, Chengfang, Shi, Jie
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 35
container_issue 1
container_start_page 17
container_title IEEE transactions on dependable and secure computing
container_volume 20
creator Ye, Qingqing
Hu, Haibo
Meng, Xiaofeng
Zheng, Huadi
Huang, Kai
Fang, Chengfang
Shi, Jie
description A key factor in big data analytics and artificial intelligence is the collection of user data from a large population. However, the collection of user data comes at the price of privacy risks, not only for users but also for businesses who are vulnerable to internal and external data breaches. To address privacy issues, local differential privacy (LDP) has been proposed to enable an untrusted collector to obtain accurate statistical estimation on sensitive user data (e.g., location, health, and financial data) without actually accessing the true records. As key-value data is an extremely popular NoSQL data model, there are a few works in the literature that study LDP-based statistical estimation on key-value data. However, these works have some major limitations, including supporting small key space only, fixed key collection range, difficulty in choosing an appropriate padding length, and high communication cost. In this article, we propose a two-phase mechanism PrivKVM^* PrivKVM* as an optimized and highly-complete solution to LDP-based key-value data collection and statistics estimation. We verify its correctness and effectiveness through rigorous theoretical analysis and extensive experimental results.
doi_str_mv 10.1109/TDSC.2021.3107512
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_2765185879</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9524509</ieee_id><sourcerecordid>2765185879</sourcerecordid><originalsourceid>FETCH-LOGICAL-c336t-31cf75efb1a3945ffa274e5b6cc2b5ac024174b15c7dbc26f66264381ec672473</originalsourceid><addsrcrecordid>eNo9UE1PAjEU3BhNRPQHGC-beF7s6-fWmwH8CBiNIB6bbm21BHdxW0j493YD8TRvkpl5702WXQIaACB5Mx_NhgOMMAwIIMEAH2U9kBQKhKA8TjOjrGBSwGl2FsISIUxLSXvZ7LX128ni-TZ_s1sffPT1Vz6xu2KhVxubz6KOPkRvQj5O8JNYU-cfPn7n08boVT7yztnW1tEn0mVpszvPTpxeBXtxwH72fj-eDx-L6cvD0_BuWhhCeCwIGCeYdRVoIilzTmNBLau4Mbhi2qQTQdAKmBGflcHccY45JSVYwwWmgvSz633uum1-NzZEtWw2bZ1WKiw4g5KVQiYV7FWmbUJorVPrNj3S7hQg1XWnuu5U1506dJc8V3uPt9b-6yXDlCFJ_gBPyWo5</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2765185879</pqid></control><display><type>article</type><title>PrivKVM: Revisiting Key-Value Statistics Estimation With Local Differential Privacy</title><source>IEEE Electronic Library (IEL)</source><creator>Ye, Qingqing ; Hu, Haibo ; Meng, Xiaofeng ; Zheng, Huadi ; Huang, Kai ; Fang, Chengfang ; Shi, Jie</creator><creatorcontrib>Ye, Qingqing ; Hu, Haibo ; Meng, Xiaofeng ; Zheng, Huadi ; Huang, Kai ; Fang, Chengfang ; Shi, Jie</creatorcontrib><description><![CDATA[A key factor in big data analytics and artificial intelligence is the collection of user data from a large population. However, the collection of user data comes at the price of privacy risks, not only for users but also for businesses who are vulnerable to internal and external data breaches. To address privacy issues, local differential privacy (LDP) has been proposed to enable an untrusted collector to obtain accurate statistical estimation on sensitive user data (e.g., location, health, and financial data) without actually accessing the true records. As key-value data is an extremely popular NoSQL data model, there are a few works in the literature that study LDP-based statistical estimation on key-value data. However, these works have some major limitations, including supporting small key space only, fixed key collection range, difficulty in choosing an appropriate padding length, and high communication cost. In this article, we propose a two-phase mechanism <inline-formula><tex-math notation="LaTeX">PrivKVM^*</tex-math> <mml:math><mml:mrow><mml:mi>P</mml:mi><mml:mi>r</mml:mi><mml:mi>i</mml:mi><mml:mi>v</mml:mi><mml:mi>K</mml:mi><mml:mi>V</mml:mi><mml:msup><mml:mi>M</mml:mi><mml:mo>*</mml:mo></mml:msup></mml:mrow></mml:math><inline-graphic xlink:href="ye-ieq1-3107512.gif"/> </inline-formula> as an optimized and highly-complete solution to LDP-based key-value data collection and statistics estimation. We verify its correctness and effectiveness through rigorous theoretical analysis and extensive experimental results.]]></description><identifier>ISSN: 1545-5971</identifier><identifier>EISSN: 1941-0018</identifier><identifier>DOI: 10.1109/TDSC.2021.3107512</identifier><identifier>CODEN: ITDSCM</identifier><language>eng</language><publisher>Washington: IEEE</publisher><subject>Artificial intelligence ; Big Data ; Data collection ; Differential privacy ; Estimation ; Extreme values ; Frequency estimation ; histogram ; Histograms ; Key-value data ; local differential privacy ; Perturbation methods ; Privacy ; privacy-preserving data collection ; statistics estimation</subject><ispartof>IEEE transactions on dependable and secure computing, 2023-01, Vol.20 (1), p.17-35</ispartof><rights>Copyright IEEE Computer Society 2023</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c336t-31cf75efb1a3945ffa274e5b6cc2b5ac024174b15c7dbc26f66264381ec672473</citedby><cites>FETCH-LOGICAL-c336t-31cf75efb1a3945ffa274e5b6cc2b5ac024174b15c7dbc26f66264381ec672473</cites><orcidid>0000-0003-1547-2847 ; 0000-0001-9857-654X ; 0000-0003-1224-9885 ; 0000-0002-9008-2112</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9524509$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,777,781,793,27905,27906,54739</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9524509$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Ye, Qingqing</creatorcontrib><creatorcontrib>Hu, Haibo</creatorcontrib><creatorcontrib>Meng, Xiaofeng</creatorcontrib><creatorcontrib>Zheng, Huadi</creatorcontrib><creatorcontrib>Huang, Kai</creatorcontrib><creatorcontrib>Fang, Chengfang</creatorcontrib><creatorcontrib>Shi, Jie</creatorcontrib><title>PrivKVM: Revisiting Key-Value Statistics Estimation With Local Differential Privacy</title><title>IEEE transactions on dependable and secure computing</title><addtitle>TDSC</addtitle><description><![CDATA[A key factor in big data analytics and artificial intelligence is the collection of user data from a large population. However, the collection of user data comes at the price of privacy risks, not only for users but also for businesses who are vulnerable to internal and external data breaches. To address privacy issues, local differential privacy (LDP) has been proposed to enable an untrusted collector to obtain accurate statistical estimation on sensitive user data (e.g., location, health, and financial data) without actually accessing the true records. As key-value data is an extremely popular NoSQL data model, there are a few works in the literature that study LDP-based statistical estimation on key-value data. However, these works have some major limitations, including supporting small key space only, fixed key collection range, difficulty in choosing an appropriate padding length, and high communication cost. In this article, we propose a two-phase mechanism <inline-formula><tex-math notation="LaTeX">PrivKVM^*</tex-math> <mml:math><mml:mrow><mml:mi>P</mml:mi><mml:mi>r</mml:mi><mml:mi>i</mml:mi><mml:mi>v</mml:mi><mml:mi>K</mml:mi><mml:mi>V</mml:mi><mml:msup><mml:mi>M</mml:mi><mml:mo>*</mml:mo></mml:msup></mml:mrow></mml:math><inline-graphic xlink:href="ye-ieq1-3107512.gif"/> </inline-formula> as an optimized and highly-complete solution to LDP-based key-value data collection and statistics estimation. We verify its correctness and effectiveness through rigorous theoretical analysis and extensive experimental results.]]></description><subject>Artificial intelligence</subject><subject>Big Data</subject><subject>Data collection</subject><subject>Differential privacy</subject><subject>Estimation</subject><subject>Extreme values</subject><subject>Frequency estimation</subject><subject>histogram</subject><subject>Histograms</subject><subject>Key-value data</subject><subject>local differential privacy</subject><subject>Perturbation methods</subject><subject>Privacy</subject><subject>privacy-preserving data collection</subject><subject>statistics estimation</subject><issn>1545-5971</issn><issn>1941-0018</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9UE1PAjEU3BhNRPQHGC-beF7s6-fWmwH8CBiNIB6bbm21BHdxW0j493YD8TRvkpl5702WXQIaACB5Mx_NhgOMMAwIIMEAH2U9kBQKhKA8TjOjrGBSwGl2FsISIUxLSXvZ7LX128ni-TZ_s1sffPT1Vz6xu2KhVxubz6KOPkRvQj5O8JNYU-cfPn7n08boVT7yztnW1tEn0mVpszvPTpxeBXtxwH72fj-eDx-L6cvD0_BuWhhCeCwIGCeYdRVoIilzTmNBLau4Mbhi2qQTQdAKmBGflcHccY45JSVYwwWmgvSz633uum1-NzZEtWw2bZ1WKiw4g5KVQiYV7FWmbUJorVPrNj3S7hQg1XWnuu5U1506dJc8V3uPt9b-6yXDlCFJ_gBPyWo5</recordid><startdate>202301</startdate><enddate>202301</enddate><creator>Ye, Qingqing</creator><creator>Hu, Haibo</creator><creator>Meng, Xiaofeng</creator><creator>Zheng, Huadi</creator><creator>Huang, Kai</creator><creator>Fang, Chengfang</creator><creator>Shi, Jie</creator><general>IEEE</general><general>IEEE Computer Society</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>JQ2</scope><orcidid>https://orcid.org/0000-0003-1547-2847</orcidid><orcidid>https://orcid.org/0000-0001-9857-654X</orcidid><orcidid>https://orcid.org/0000-0003-1224-9885</orcidid><orcidid>https://orcid.org/0000-0002-9008-2112</orcidid></search><sort><creationdate>202301</creationdate><title>PrivKVM: Revisiting Key-Value Statistics Estimation With Local Differential Privacy</title><author>Ye, Qingqing ; Hu, Haibo ; Meng, Xiaofeng ; Zheng, Huadi ; Huang, Kai ; Fang, Chengfang ; Shi, Jie</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c336t-31cf75efb1a3945ffa274e5b6cc2b5ac024174b15c7dbc26f66264381ec672473</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Artificial intelligence</topic><topic>Big Data</topic><topic>Data collection</topic><topic>Differential privacy</topic><topic>Estimation</topic><topic>Extreme values</topic><topic>Frequency estimation</topic><topic>histogram</topic><topic>Histograms</topic><topic>Key-value data</topic><topic>local differential privacy</topic><topic>Perturbation methods</topic><topic>Privacy</topic><topic>privacy-preserving data collection</topic><topic>statistics estimation</topic><toplevel>online_resources</toplevel><creatorcontrib>Ye, Qingqing</creatorcontrib><creatorcontrib>Hu, Haibo</creatorcontrib><creatorcontrib>Meng, Xiaofeng</creatorcontrib><creatorcontrib>Zheng, Huadi</creatorcontrib><creatorcontrib>Huang, Kai</creatorcontrib><creatorcontrib>Fang, Chengfang</creatorcontrib><creatorcontrib>Shi, Jie</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>ProQuest Computer Science Collection</collection><jtitle>IEEE transactions on dependable and secure computing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Ye, Qingqing</au><au>Hu, Haibo</au><au>Meng, Xiaofeng</au><au>Zheng, Huadi</au><au>Huang, Kai</au><au>Fang, Chengfang</au><au>Shi, Jie</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>PrivKVM: Revisiting Key-Value Statistics Estimation With Local Differential Privacy</atitle><jtitle>IEEE transactions on dependable and secure computing</jtitle><stitle>TDSC</stitle><date>2023-01</date><risdate>2023</risdate><volume>20</volume><issue>1</issue><spage>17</spage><epage>35</epage><pages>17-35</pages><issn>1545-5971</issn><eissn>1941-0018</eissn><coden>ITDSCM</coden><abstract><![CDATA[A key factor in big data analytics and artificial intelligence is the collection of user data from a large population. However, the collection of user data comes at the price of privacy risks, not only for users but also for businesses who are vulnerable to internal and external data breaches. To address privacy issues, local differential privacy (LDP) has been proposed to enable an untrusted collector to obtain accurate statistical estimation on sensitive user data (e.g., location, health, and financial data) without actually accessing the true records. As key-value data is an extremely popular NoSQL data model, there are a few works in the literature that study LDP-based statistical estimation on key-value data. However, these works have some major limitations, including supporting small key space only, fixed key collection range, difficulty in choosing an appropriate padding length, and high communication cost. In this article, we propose a two-phase mechanism <inline-formula><tex-math notation="LaTeX">PrivKVM^*</tex-math> <mml:math><mml:mrow><mml:mi>P</mml:mi><mml:mi>r</mml:mi><mml:mi>i</mml:mi><mml:mi>v</mml:mi><mml:mi>K</mml:mi><mml:mi>V</mml:mi><mml:msup><mml:mi>M</mml:mi><mml:mo>*</mml:mo></mml:msup></mml:mrow></mml:math><inline-graphic xlink:href="ye-ieq1-3107512.gif"/> </inline-formula> as an optimized and highly-complete solution to LDP-based key-value data collection and statistics estimation. We verify its correctness and effectiveness through rigorous theoretical analysis and extensive experimental results.]]></abstract><cop>Washington</cop><pub>IEEE</pub><doi>10.1109/TDSC.2021.3107512</doi><tpages>19</tpages><orcidid>https://orcid.org/0000-0003-1547-2847</orcidid><orcidid>https://orcid.org/0000-0001-9857-654X</orcidid><orcidid>https://orcid.org/0000-0003-1224-9885</orcidid><orcidid>https://orcid.org/0000-0002-9008-2112</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1545-5971
ispartof IEEE transactions on dependable and secure computing, 2023-01, Vol.20 (1), p.17-35
issn 1545-5971
1941-0018
language eng
recordid cdi_proquest_journals_2765185879
source IEEE Electronic Library (IEL)
subjects Artificial intelligence
Big Data
Data collection
Differential privacy
Estimation
Extreme values
Frequency estimation
histogram
Histograms
Key-value data
local differential privacy
Perturbation methods
Privacy
privacy-preserving data collection
statistics estimation
title PrivKVM: Revisiting Key-Value Statistics Estimation With Local Differential Privacy
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-17T20%3A51%3A43IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=PrivKVM:%20Revisiting%20Key-Value%20Statistics%20Estimation%20With%20Local%20Differential%20Privacy&rft.jtitle=IEEE%20transactions%20on%20dependable%20and%20secure%20computing&rft.au=Ye,%20Qingqing&rft.date=2023-01&rft.volume=20&rft.issue=1&rft.spage=17&rft.epage=35&rft.pages=17-35&rft.issn=1545-5971&rft.eissn=1941-0018&rft.coden=ITDSCM&rft_id=info:doi/10.1109/TDSC.2021.3107512&rft_dat=%3Cproquest_RIE%3E2765185879%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2765185879&rft_id=info:pmid/&rft_ieee_id=9524509&rfr_iscdi=true