Protect privacy of deep classification networks by exploiting their generative power

Research showed that deep learning models are vulnerable to membership inference attacks, which aim to determine if an example is in the training set of the model. We propose a new framework to defend against this sort of attack. Our key insight is that if we retrain the original classifier with a n...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Machine learning 2021-04, Vol.110 (4), p.651-674
Hauptverfasser:	Chen, Jiyu, Guo, Yiwen, Zheng, Qianjun, Chen, Hao
Format:	Artikel
Sprache:	eng
Schlagworte:	Artificial Intelligence Classifiers Computer Science Control Datasets Machine Learning Mechatronics Natural Language Processing (NLP) Robotics Simulation and Modeling Special Issue of the ECML PKDD 2021 Journal Track Training
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Research showed that deep learning models are vulnerable to membership inference attacks, which aim to determine if an example is in the training set of the model. We propose a new framework to defend against this sort of attack. Our key insight is that if we retrain the original classifier with a new dataset that is independent of the original training set while their elements are sampled from the same distribution, the retrained classifier will leak no information that cannot be inferred from the distribution about the original training set. Our framework consists of three phases. First, we transferred the original classifier to a Joint Energy-based Model (JEM) to exploit the model’s implicit generative power. Then, we sampled from the JEM to create a new dataset. Finally, we used the new dataset to retrain or fine-tune the original classifier. We empirically studied different transfer learning schemes for the JEM and fine-tuning/retraining strategies for the classifier against shadow-model attacks. Our evaluation shows that our framework can suppress the attacker’s membership advantage to a negligible level while keeping the classifier’s accuracy acceptable. We compared it with other state-of-the-art defenses considering adaptive attackers and showed our defense is effective even under the worst-case scenario. Besides, we also found that combining other defenses with our framework often achieves better robustness. Our code will be made available at https://github.com/ChenJiyu/meminf-defense.git .
ISSN:	0885-6125 1573-0565
DOI:	10.1007/s10994-021-05951-6