Differentially Private Stream Processing at Scale

We design, to the best of our knowledge, the first differentially private (DP) stream aggregation processing system at scale. Our system -- Differential Privacy SQL Pipelines (DP-SQLP) -- is built using a streaming framework similar to Spark streaming, and is built on top of the Spanner database and...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Zhang, Bing, Doroshenko, Vadym, Kairouz, Peter, Steinke, Thomas, Thakurta, Abhradeep, Ma, Ziyin, Cohen, Eidan, Apte, Himani, Spacek, Jodi
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Cryptography and Security Computer Science - Databases
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Zhang, Bing Doroshenko, Vadym Kairouz, Peter Steinke, Thomas Thakurta, Abhradeep Ma, Ziyin Cohen, Eidan Apte, Himani Spacek, Jodi
description	We design, to the best of our knowledge, the first differentially private (DP) stream aggregation processing system at scale. Our system -- Differential Privacy SQL Pipelines (DP-SQLP) -- is built using a streaming framework similar to Spark streaming, and is built on top of the Spanner database and the F1 query engine from Google. Towards designing DP-SQLP we make both algorithmic and systemic advances, namely, we (i) design a novel (user-level) DP key selection algorithm that can operate on an unbounded set of possible keys, and can scale to one billion keys that users have contributed, (ii) design a preemptive execution scheme for DP key selection that avoids enumerating all the keys at each triggering time, and (iii) use algorithmic techniques from DP continual observation to release a continual DP histogram of user contributions to different keys over the stream length. We empirically demonstrate the efficacy by obtaining at least $16\times$ reduction in error over meaningful baselines we consider. We implemented a streaming differentially private user impressions for Google Shopping with DP-SQLP. The streaming DP algorithms are further applied to Google Trends.
doi_str_mv	10.48550/arxiv.2303.18086
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2303_18086</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2303_18086</sourcerecordid><originalsourceid>FETCH-LOGICAL-a676-b444b362c1df2472187746ae49cd3ab1459779b42c837f8fb20a3c36af618c763</originalsourceid><addsrcrecordid>eNotzs2KwjAUhuFsXAx1LmBW9gZak5zTJF2Kzo8gKNR9OYmJBOoPaZHp3c9Mx9XHu_l4GHsTvERTVXxJ6Ts-SgkcSmG4US9MbGIIPvnrEKnrxvyQ4oMGnzdD8nT5zZvzfR-v55yGvHHU-TmbBep6__rcjB0_3o_rr2K3_9yuV7uClFaFRUQLSjpxChK1FEZrVOSxdicgK7Cqta4tSmdABxOs5AQOFAUljNMKMrb4v53M7T3FC6Wx_bO3kx1-AFbkPac</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Differentially Private Stream Processing at Scale</title><source>arXiv.org</source><creator>Zhang, Bing ; Doroshenko, Vadym ; Kairouz, Peter ; Steinke, Thomas ; Thakurta, Abhradeep ; Ma, Ziyin ; Cohen, Eidan ; Apte, Himani ; Spacek, Jodi</creator><creatorcontrib>Zhang, Bing ; Doroshenko, Vadym ; Kairouz, Peter ; Steinke, Thomas ; Thakurta, Abhradeep ; Ma, Ziyin ; Cohen, Eidan ; Apte, Himani ; Spacek, Jodi</creatorcontrib><description>We design, to the best of our knowledge, the first differentially private (DP) stream aggregation processing system at scale. Our system -- Differential Privacy SQL Pipelines (DP-SQLP) -- is built using a streaming framework similar to Spark streaming, and is built on top of the Spanner database and the F1 query engine from Google. Towards designing DP-SQLP we make both algorithmic and systemic advances, namely, we (i) design a novel (user-level) DP key selection algorithm that can operate on an unbounded set of possible keys, and can scale to one billion keys that users have contributed, (ii) design a preemptive execution scheme for DP key selection that avoids enumerating all the keys at each triggering time, and (iii) use algorithmic techniques from DP continual observation to release a continual DP histogram of user contributions to different keys over the stream length. We empirically demonstrate the efficacy by obtaining at least $16\times$ reduction in error over meaningful baselines we consider. We implemented a streaming differentially private user impressions for Google Shopping with DP-SQLP. The streaming DP algorithms are further applied to Google Trends.</description><identifier>DOI: 10.48550/arxiv.2303.18086</identifier><language>eng</language><subject>Computer Science - Cryptography and Security ; Computer Science - Databases</subject><creationdate>2023-03</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2303.18086$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2303.18086$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Zhang, Bing</creatorcontrib><creatorcontrib>Doroshenko, Vadym</creatorcontrib><creatorcontrib>Kairouz, Peter</creatorcontrib><creatorcontrib>Steinke, Thomas</creatorcontrib><creatorcontrib>Thakurta, Abhradeep</creatorcontrib><creatorcontrib>Ma, Ziyin</creatorcontrib><creatorcontrib>Cohen, Eidan</creatorcontrib><creatorcontrib>Apte, Himani</creatorcontrib><creatorcontrib>Spacek, Jodi</creatorcontrib><title>Differentially Private Stream Processing at Scale</title><description>We design, to the best of our knowledge, the first differentially private (DP) stream aggregation processing system at scale. Our system -- Differential Privacy SQL Pipelines (DP-SQLP) -- is built using a streaming framework similar to Spark streaming, and is built on top of the Spanner database and the F1 query engine from Google. Towards designing DP-SQLP we make both algorithmic and systemic advances, namely, we (i) design a novel (user-level) DP key selection algorithm that can operate on an unbounded set of possible keys, and can scale to one billion keys that users have contributed, (ii) design a preemptive execution scheme for DP key selection that avoids enumerating all the keys at each triggering time, and (iii) use algorithmic techniques from DP continual observation to release a continual DP histogram of user contributions to different keys over the stream length. We empirically demonstrate the efficacy by obtaining at least $16\times$ reduction in error over meaningful baselines we consider. We implemented a streaming differentially private user impressions for Google Shopping with DP-SQLP. The streaming DP algorithms are further applied to Google Trends.</description><subject>Computer Science - Cryptography and Security</subject><subject>Computer Science - Databases</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotzs2KwjAUhuFsXAx1LmBW9gZak5zTJF2Kzo8gKNR9OYmJBOoPaZHp3c9Mx9XHu_l4GHsTvERTVXxJ6Ts-SgkcSmG4US9MbGIIPvnrEKnrxvyQ4oMGnzdD8nT5zZvzfR-v55yGvHHU-TmbBep6__rcjB0_3o_rr2K3_9yuV7uClFaFRUQLSjpxChK1FEZrVOSxdicgK7Cqta4tSmdABxOs5AQOFAUljNMKMrb4v53M7T3FC6Wx_bO3kx1-AFbkPac</recordid><startdate>20230331</startdate><enddate>20230331</enddate><creator>Zhang, Bing</creator><creator>Doroshenko, Vadym</creator><creator>Kairouz, Peter</creator><creator>Steinke, Thomas</creator><creator>Thakurta, Abhradeep</creator><creator>Ma, Ziyin</creator><creator>Cohen, Eidan</creator><creator>Apte, Himani</creator><creator>Spacek, Jodi</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20230331</creationdate><title>Differentially Private Stream Processing at Scale</title><author>Zhang, Bing ; Doroshenko, Vadym ; Kairouz, Peter ; Steinke, Thomas ; Thakurta, Abhradeep ; Ma, Ziyin ; Cohen, Eidan ; Apte, Himani ; Spacek, Jodi</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a676-b444b362c1df2472187746ae49cd3ab1459779b42c837f8fb20a3c36af618c763</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Cryptography and Security</topic><topic>Computer Science - Databases</topic><toplevel>online_resources</toplevel><creatorcontrib>Zhang, Bing</creatorcontrib><creatorcontrib>Doroshenko, Vadym</creatorcontrib><creatorcontrib>Kairouz, Peter</creatorcontrib><creatorcontrib>Steinke, Thomas</creatorcontrib><creatorcontrib>Thakurta, Abhradeep</creatorcontrib><creatorcontrib>Ma, Ziyin</creatorcontrib><creatorcontrib>Cohen, Eidan</creatorcontrib><creatorcontrib>Apte, Himani</creatorcontrib><creatorcontrib>Spacek, Jodi</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Zhang, Bing</au><au>Doroshenko, Vadym</au><au>Kairouz, Peter</au><au>Steinke, Thomas</au><au>Thakurta, Abhradeep</au><au>Ma, Ziyin</au><au>Cohen, Eidan</au><au>Apte, Himani</au><au>Spacek, Jodi</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Differentially Private Stream Processing at Scale</atitle><date>2023-03-31</date><risdate>2023</risdate><abstract>We design, to the best of our knowledge, the first differentially private (DP) stream aggregation processing system at scale. Our system -- Differential Privacy SQL Pipelines (DP-SQLP) -- is built using a streaming framework similar to Spark streaming, and is built on top of the Spanner database and the F1 query engine from Google. Towards designing DP-SQLP we make both algorithmic and systemic advances, namely, we (i) design a novel (user-level) DP key selection algorithm that can operate on an unbounded set of possible keys, and can scale to one billion keys that users have contributed, (ii) design a preemptive execution scheme for DP key selection that avoids enumerating all the keys at each triggering time, and (iii) use algorithmic techniques from DP continual observation to release a continual DP histogram of user contributions to different keys over the stream length. We empirically demonstrate the efficacy by obtaining at least $16\times$ reduction in error over meaningful baselines we consider. We implemented a streaming differentially private user impressions for Google Shopping with DP-SQLP. The streaming DP algorithms are further applied to Google Trends.</abstract><doi>10.48550/arxiv.2303.18086</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2303.18086
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2303_18086
source	arXiv.org
subjects	Computer Science - Cryptography and Security Computer Science - Databases
title	Differentially Private Stream Processing at Scale
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-05T16%3A55%3A21IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Differentially%20Private%20Stream%20Processing%20at%20Scale&rft.au=Zhang,%20Bing&rft.date=2023-03-31&rft_id=info:doi/10.48550/arxiv.2303.18086&rft_dat=%3Carxiv_GOX%3E2303_18086%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true