PreFair: Privately Generating Justifiably Fair Synthetic Data

When a database is protected by Differential Privacy (DP), its usability is limited in scope. In this scenario, generating a synthetic version of the data that mimics the properties of the private data allows users to perform any operation on the synthetic data, while maintaining the privacy of the...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Proceedings of the VLDB Endowment 2023-02, Vol.16 (6), p.1573-1586
Hauptverfasser: Pujol, David, Gilad, Amir, Machanavajjhala, Ashwin
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1586
container_issue 6
container_start_page 1573
container_title Proceedings of the VLDB Endowment
container_volume 16
creator Pujol, David
Gilad, Amir
Machanavajjhala, Ashwin
description When a database is protected by Differential Privacy (DP), its usability is limited in scope. In this scenario, generating a synthetic version of the data that mimics the properties of the private data allows users to perform any operation on the synthetic data, while maintaining the privacy of the original data. Therefore, multiple works have been devoted to devising systems for DP synthetic data generation. However, such systems may preserve or even magnify properties of the data that make it unfair, rendering the synthetic data unfit for use. In this work, we present PreFair, a system that allows for DP fair synthetic data generation. PreFair extends the state-of-the-art DP data generation mechanisms by incorporating a causal fairness criterion that ensures fair synthetic data. We adapt the notion of justifiable fairness to fit the synthetic data generation scenario. We further study the problem of generating DP fair synthetic data, showing its intractability and designing algorithms that are optimal under certain assumptions. We also provide an extensive experimental evaluation, showing that PreFair generates synthetic data that is significantly fairer than the data generated by leading DP data generation mechanisms, while remaining faithful to the private data.
doi_str_mv 10.14778/3583140.3583168
format Article
fullrecord <record><control><sourceid>crossref</sourceid><recordid>TN_cdi_crossref_primary_10_14778_3583140_3583168</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_14778_3583140_3583168</sourcerecordid><originalsourceid>FETCH-LOGICAL-c196t-87b5f35124b8785fbb9d71e48cfe5a2738287f6785d5d1ca1b9658332f69d26d3</originalsourceid><addsrcrecordid>eNpNj0FLAzEUhIMoWKt3j_sHtuYlm-RF8CDVVkuhBfW8JLuJRuoqSRT233ete_A0w8ww8BFyCXQGlVJ4xQVyqOjsoBKPyISBoCVSrY7_-VNyltI7pRIl4ITcbKNbmBCvi20MPya7XV8sXeeiyaF7LVbfKQcfjB3i31nx1Hf5zeXQFHcmm3Ny4s0uuYtRp-Rlcf88fyjXm-Xj_HZdNqBlLlFZ4bkAVllUKLy1ulXgKmy8E4YpjgyVl0PVihYaA1bLgYIzL3XLZMunhP79NvEzpeh8_RXDh4l9DbQ-4Ncjfj3i8z1d3Ew1</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>PreFair: Privately Generating Justifiably Fair Synthetic Data</title><source>ACM Digital Library Complete</source><creator>Pujol, David ; Gilad, Amir ; Machanavajjhala, Ashwin</creator><creatorcontrib>Pujol, David ; Gilad, Amir ; Machanavajjhala, Ashwin</creatorcontrib><description>When a database is protected by Differential Privacy (DP), its usability is limited in scope. In this scenario, generating a synthetic version of the data that mimics the properties of the private data allows users to perform any operation on the synthetic data, while maintaining the privacy of the original data. Therefore, multiple works have been devoted to devising systems for DP synthetic data generation. However, such systems may preserve or even magnify properties of the data that make it unfair, rendering the synthetic data unfit for use. In this work, we present PreFair, a system that allows for DP fair synthetic data generation. PreFair extends the state-of-the-art DP data generation mechanisms by incorporating a causal fairness criterion that ensures fair synthetic data. We adapt the notion of justifiable fairness to fit the synthetic data generation scenario. We further study the problem of generating DP fair synthetic data, showing its intractability and designing algorithms that are optimal under certain assumptions. We also provide an extensive experimental evaluation, showing that PreFair generates synthetic data that is significantly fairer than the data generated by leading DP data generation mechanisms, while remaining faithful to the private data.</description><identifier>ISSN: 2150-8097</identifier><identifier>EISSN: 2150-8097</identifier><identifier>DOI: 10.14778/3583140.3583168</identifier><language>eng</language><ispartof>Proceedings of the VLDB Endowment, 2023-02, Vol.16 (6), p.1573-1586</ispartof><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c196t-87b5f35124b8785fbb9d71e48cfe5a2738287f6785d5d1ca1b9658332f69d26d3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27901,27902</link.rule.ids></links><search><creatorcontrib>Pujol, David</creatorcontrib><creatorcontrib>Gilad, Amir</creatorcontrib><creatorcontrib>Machanavajjhala, Ashwin</creatorcontrib><title>PreFair: Privately Generating Justifiably Fair Synthetic Data</title><title>Proceedings of the VLDB Endowment</title><description>When a database is protected by Differential Privacy (DP), its usability is limited in scope. In this scenario, generating a synthetic version of the data that mimics the properties of the private data allows users to perform any operation on the synthetic data, while maintaining the privacy of the original data. Therefore, multiple works have been devoted to devising systems for DP synthetic data generation. However, such systems may preserve or even magnify properties of the data that make it unfair, rendering the synthetic data unfit for use. In this work, we present PreFair, a system that allows for DP fair synthetic data generation. PreFair extends the state-of-the-art DP data generation mechanisms by incorporating a causal fairness criterion that ensures fair synthetic data. We adapt the notion of justifiable fairness to fit the synthetic data generation scenario. We further study the problem of generating DP fair synthetic data, showing its intractability and designing algorithms that are optimal under certain assumptions. We also provide an extensive experimental evaluation, showing that PreFair generates synthetic data that is significantly fairer than the data generated by leading DP data generation mechanisms, while remaining faithful to the private data.</description><issn>2150-8097</issn><issn>2150-8097</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><recordid>eNpNj0FLAzEUhIMoWKt3j_sHtuYlm-RF8CDVVkuhBfW8JLuJRuoqSRT233ete_A0w8ww8BFyCXQGlVJ4xQVyqOjsoBKPyISBoCVSrY7_-VNyltI7pRIl4ITcbKNbmBCvi20MPya7XV8sXeeiyaF7LVbfKQcfjB3i31nx1Hf5zeXQFHcmm3Ny4s0uuYtRp-Rlcf88fyjXm-Xj_HZdNqBlLlFZ4bkAVllUKLy1ulXgKmy8E4YpjgyVl0PVihYaA1bLgYIzL3XLZMunhP79NvEzpeh8_RXDh4l9DbQ-4Ncjfj3i8z1d3Ew1</recordid><startdate>20230201</startdate><enddate>20230201</enddate><creator>Pujol, David</creator><creator>Gilad, Amir</creator><creator>Machanavajjhala, Ashwin</creator><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20230201</creationdate><title>PreFair: Privately Generating Justifiably Fair Synthetic Data</title><author>Pujol, David ; Gilad, Amir ; Machanavajjhala, Ashwin</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c196t-87b5f35124b8785fbb9d71e48cfe5a2738287f6785d5d1ca1b9658332f69d26d3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Pujol, David</creatorcontrib><creatorcontrib>Gilad, Amir</creatorcontrib><creatorcontrib>Machanavajjhala, Ashwin</creatorcontrib><collection>CrossRef</collection><jtitle>Proceedings of the VLDB Endowment</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Pujol, David</au><au>Gilad, Amir</au><au>Machanavajjhala, Ashwin</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>PreFair: Privately Generating Justifiably Fair Synthetic Data</atitle><jtitle>Proceedings of the VLDB Endowment</jtitle><date>2023-02-01</date><risdate>2023</risdate><volume>16</volume><issue>6</issue><spage>1573</spage><epage>1586</epage><pages>1573-1586</pages><issn>2150-8097</issn><eissn>2150-8097</eissn><abstract>When a database is protected by Differential Privacy (DP), its usability is limited in scope. In this scenario, generating a synthetic version of the data that mimics the properties of the private data allows users to perform any operation on the synthetic data, while maintaining the privacy of the original data. Therefore, multiple works have been devoted to devising systems for DP synthetic data generation. However, such systems may preserve or even magnify properties of the data that make it unfair, rendering the synthetic data unfit for use. In this work, we present PreFair, a system that allows for DP fair synthetic data generation. PreFair extends the state-of-the-art DP data generation mechanisms by incorporating a causal fairness criterion that ensures fair synthetic data. We adapt the notion of justifiable fairness to fit the synthetic data generation scenario. We further study the problem of generating DP fair synthetic data, showing its intractability and designing algorithms that are optimal under certain assumptions. We also provide an extensive experimental evaluation, showing that PreFair generates synthetic data that is significantly fairer than the data generated by leading DP data generation mechanisms, while remaining faithful to the private data.</abstract><doi>10.14778/3583140.3583168</doi><tpages>14</tpages></addata></record>
fulltext fulltext
identifier ISSN: 2150-8097
ispartof Proceedings of the VLDB Endowment, 2023-02, Vol.16 (6), p.1573-1586
issn 2150-8097
2150-8097
language eng
recordid cdi_crossref_primary_10_14778_3583140_3583168
source ACM Digital Library Complete
title PreFair: Privately Generating Justifiably Fair Synthetic Data
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-02T03%3A32%3A55IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=PreFair:%20Privately%20Generating%20Justifiably%20Fair%20Synthetic%20Data&rft.jtitle=Proceedings%20of%20the%20VLDB%20Endowment&rft.au=Pujol,%20David&rft.date=2023-02-01&rft.volume=16&rft.issue=6&rft.spage=1573&rft.epage=1586&rft.pages=1573-1586&rft.issn=2150-8097&rft.eissn=2150-8097&rft_id=info:doi/10.14778/3583140.3583168&rft_dat=%3Ccrossref%3E10_14778_3583140_3583168%3C/crossref%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true