Selecting Optimal Subset to Release Under Differentially Private M-Estimators from Hybrid Datasets

Privacy concern in data sharing especially for health data gains particularly increasing attention nowadays. Now, some patients agree to open their information for research use, which gives rise to a new question of how to effectively use the public information to better understand the private datas...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on knowledge and data engineering 2018-03, Vol.30 (3), p.573-584
Hauptverfasser:	Wang, Meng, Ji, Zhanglong, Kim, Hyeon-Eui, Wang, Shuang, Xiong, Li, Jiang, Xiaoqian
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer simulation Data models Data privacy Data retrieval Datasets Differential privacy Estimators hybrid datasets Interactive learning Logistics M-estimators Privacy Sensitivity Sociology Statistics
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Privacy concern in data sharing especially for health data gains particularly increasing attention nowadays. Now, some patients agree to open their information for research use, which gives rise to a new question of how to effectively use the public information to better understand the private dataset without breaching privacy. In this paper, we specialize this question as selecting an optimal subset of the public dataset for M-estimators in the framework of differential privacy (DP) in [1]. From a perspective of non-interactive learning, we first construct the weighted private density estimation from the hybrid datasets under DP. Along the same line as [2], we analyze the accuracy of the DP M-estimators based on the hybrid datasets. Our main contributions are (i) we find that the bias-variance tradeoff in the performance of our M-estimators can be characterized in the sample size of the released dataset; (ii) based on this finding, we develop an algorithm to select the optimal subset of the public dataset to release under DP. Our simulation studies and application to the real datasets confirm our findings and set a guideline in the real application.
ISSN:	1041-4347 1558-2191
DOI:	10.1109/TKDE.2017.2773545