A Linear Reconstruction Approach for Attribute Inference Attacks against Synthetic Data
Published in the Proceedings of the 33rd USENIX Security Symposium (USENIX Security 2024), please cite accordingly Recent advances in synthetic data generation (SDG) have been hailed as a solution to the difficult problem of sharing sensitive data while protecting privacy. SDG aims to learn statisti...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Published in the Proceedings of the 33rd USENIX Security Symposium
(USENIX Security 2024), please cite accordingly Recent advances in synthetic data generation (SDG) have been hailed as a
solution to the difficult problem of sharing sensitive data while protecting
privacy. SDG aims to learn statistical properties of real data in order to
generate "artificial" data that are structurally and statistically similar to
sensitive data. However, prior research suggests that inference attacks on
synthetic data can undermine privacy, but only for specific outlier records. In
this work, we introduce a new attribute inference attack against synthetic
data. The attack is based on linear reconstruction methods for aggregate
statistics, which target all records in the dataset, not only outliers. We
evaluate our attack on state-of-the-art SDG algorithms, including Probabilistic
Graphical Models, Generative Adversarial Networks, and recent differentially
private SDG mechanisms. By defining a formal privacy game, we show that our
attack can be highly accurate even on arbitrary records, and that this is the
result of individual information leakage (as opposed to population-level
inference). We then systematically evaluate the tradeoff between protecting
privacy and preserving statistical utility. Our findings suggest that current
SDG methods cannot consistently provide sufficient privacy protection against
inference attacks while retaining reasonable utility. The best method
evaluated, a differentially private SDG mechanism, can provide both protection
against inference attacks and reasonable utility, but only in very specific
settings. Lastly, we show that releasing a larger number of synthetic records
can improve utility but at the cost of making attacks far more effective. |
---|---|
DOI: | 10.48550/arxiv.2301.10053 |