A New Global Measure to Simultaneously Evaluate Data Utility and Privacy Risk

Measuring data utility and privacy risk embedded in synthetic or other de-identified datasets is an increasingly important research area. Existing measures in the data privacy literature however are one-sided in that they either measure utility or privacy risk only. In this paper we propose a new me...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on information forensics and security 2023, Vol.18, p.715-729
Hauptverfasser: Jeong, Donghoon, Kim, Joseph H. T., Im, Jongho
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Measuring data utility and privacy risk embedded in synthetic or other de-identified datasets is an increasingly important research area. Existing measures in the data privacy literature however are one-sided in that they either measure utility or privacy risk only. In this paper we propose a new measure that can evaluate both data utility and privacy, a well-known trade-off relationship in data synthesis. The proposed measure employs the notion of relative distance between the synthetic and original datasets at the dataset level, and can identify the optimally balanced position of the synthetic data in terms of both utility and privacy. In addition, we devise a graphical tool that visually reveals the current utility-privacy trade-off position of the synthetic data. Numerical studies show our new measure consistently performs better and offers richer interpretations than other existing global data utility measures, for both simulated and real datasets, confirming its distinctive advantages.
ISSN:1556-6013
1556-6021
DOI:10.1109/TIFS.2022.3228753