PSynDB: accurate and accessible private data generation
Across many application domains, trusted parties who collect sensitive information need mechanisms to safely disseminate data. A favored approach is to generate synthetic data : a dataset similar to the original, hopefully retaining its statistical features, but one that does not reveal the private...
Gespeichert in:
Veröffentlicht in: | Proceedings of the VLDB Endowment 2019-08, Vol.12 (12), p.1918-1921 |
---|---|
Hauptverfasser: | , , , , , |
Format: | Artikel |
Sprache: | eng |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Across many application domains, trusted parties who collect sensitive information need mechanisms to safely disseminate data. A favored approach is to generate
synthetic data
: a dataset similar to the original, hopefully retaining its statistical features, but one that does not reveal the private information of contributors to the data.
We present PSynDB, a web-based synthetic table generator that is built on recent privacy technologies [10,11,15]. PSynDB satisfies the formal guarantee of differential privacy and generates synthetic tables with high accuracy for tasks that the user specifies as important. PSynDB allows users to browse expected error rates before running the mechanism, a useful feature for making important policy decisions, such as setting the privacy loss budget. When the user has finished configuration, the tool outputs a data synthesis program that can be ported to a trusted environment. There it can be safely executed on the private data to produce the private synthetic dataset for broad dissemination. |
---|---|
ISSN: | 2150-8097 2150-8097 |
DOI: | 10.14778/3352063.3352099 |