Explaining the Model, Protecting Your Data: Revealing and Mitigating the Data Privacy Risks of Post-Hoc Model Explanations via Membership Inference
Predictive models using machine learning algorithms are becoming increasingly deployed in high-states contexts in various fields, such as medicine, finance, and law. Since these models rely heavily on sensitive personal data, regulatory principles that protect the privacy of training data are import...
Gespeichert in:
1. Verfasser: | |
---|---|
Format: | Dissertation |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Predictive models using machine learning algorithms are becoming increasingly deployed in high-states contexts in various fields, such as medicine, finance, and law. Since these models rely heavily on sensitive personal data, regulatory principles that protect the privacy of training data are important. Concurrently, given the inherent complexity of models used in these high-stakes settings, model explanations are necessary in informing users about how models make decisions on data. Explainability and privacy are often viewed as conflicting: explainability promotes transparency, and privacy is a limit on transparency.
In this thesis, not only do we systematically study the under-addressed trade-off between deep learning explainability and privacy, but we push the boundaries of this trade-off: with a focus on foundation models for image classification tasks, we reveal unforeseen privacy risks of post-hoc model explanations and subsequently offer mitigation strategies for such risks. First, we construct VAR-LRT and L1-LRT/L2-LRT, two novel membership inference attacks based on model predictions and explanations that are significantly more successful than existing attacks, particularly in the low false-positive rate regime that allows an adversary to identify specific training set members with high confidence. Second, we find empirically that optimized differentially private fine-tuning of foundation models substantially diminishes the success of the aforementioned attacks, while maintaining competitive model accuracy. This portion of our work fills a gap in literature—there is no prior work that thoroughly quantifies the relationship between differential privacy and the subsequent privacy risks of model explanations in the deep learning setting. Third, we compare the quality of explanations from differentially private models with that of their non-private counterparts. We propose statistical estimators for the consistency and prediction gap fidelity metrics that are suitable for high-dimensional, quantitative data settings, and empirically, we find evidence of a trade-off between privacy strength and explanation quality; this trade-off bears meaningful implications to researchers, policymakers, and legal scholars alike. Although differential privacy has potential as a training data leakage method, further work is important for better understanding how to optimize for this privacy versus quality trade-off.
Overall, we carry out a rigorous empirical analysis w |
---|