PSynDB: accurate and accessible private data generation

Across many application domains, trusted parties who collect sensitive information need mechanisms to safely disseminate data. A favored approach is to generate synthetic data : a dataset similar to the original, hopefully retaining its statistical features, but one that does not reveal the private...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Proceedings of the VLDB Endowment 2019-08, Vol.12 (12), p.1918-1921
Hauptverfasser: Huang, Zhiqi, McKenna, Ryan, Bissias, George, Miklau, Gerome, Hay, Michael, Machanavajjhala, Ashwin
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Across many application domains, trusted parties who collect sensitive information need mechanisms to safely disseminate data. A favored approach is to generate synthetic data : a dataset similar to the original, hopefully retaining its statistical features, but one that does not reveal the private information of contributors to the data. We present PSynDB, a web-based synthetic table generator that is built on recent privacy technologies [10,11,15]. PSynDB satisfies the formal guarantee of differential privacy and generates synthetic tables with high accuracy for tasks that the user specifies as important. PSynDB allows users to browse expected error rates before running the mechanism, a useful feature for making important policy decisions, such as setting the privacy loss budget. When the user has finished configuration, the tool outputs a data synthesis program that can be ported to a trusted environment. There it can be safely executed on the private data to produce the private synthetic dataset for broad dissemination.
ISSN:2150-8097
2150-8097
DOI:10.14778/3352063.3352099