SHAPr: An Efficient and Versatile Membership Privacy Risk Metric for Machine Learning
Data used to train machine learning (ML) models can be sensitive. Membership inference attacks (MIAs), attempting to determine whether a particular data record was used to train an ML model, risk violating membership privacy. ML model builders need a principled definition of a metric to quantify the...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Data used to train machine learning (ML) models can be sensitive. Membership
inference attacks (MIAs), attempting to determine whether a particular data
record was used to train an ML model, risk violating membership privacy. ML
model builders need a principled definition of a metric to quantify the
membership privacy risk of (a) individual training data records, (b) computed
independently of specific MIAs, (c) which assesses susceptibility to different
MIAs, (d) can be used for different applications, and (e) efficiently. None of
the prior membership privacy risk metrics simultaneously meet all these
requirements.
We present SHAPr, a membership privacy metric based on Shapley values which
is a leave-one-out (LOO) technique, originally intended to measure the
contribution of a training data record on model utility. We conjecture that
contribution to model utility can act as a proxy for memorization, and hence
represent membership privacy risk.
Using ten benchmark datasets, we show that SHAPr is indeed effective in
estimating susceptibility of training data records to MIAs. We also show that,
unlike prior work, SHAPr is significantly better in estimating susceptibility
to newer, and more effective MIA. We apply SHAPr to evaluate the efficacy of
several defenses against MIAs: using regularization and removing high risk
training data records. Moreover, SHAPr is versatile: it can be used for
estimating vulnerability of different subgroups to MIAs, and inherits
applications of Shapley values (e.g., data valuation). We show that SHAPr has
an acceptable computational cost (compared to naive LOO), varying from a few
minutes for the smallest dataset to ~92 minutes for the largest dataset. |
---|---|
DOI: | 10.48550/arxiv.2112.02230 |