An Information Theoretic Evaluation Metric For Strong Unlearning
Machine unlearning (MU) aims to remove the influence of specific data from trained models, addressing privacy concerns and ensuring compliance with regulations such as the "right to be forgotten." Evaluating strong unlearning, where the unlearned model is indistinguishable from one retrain...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Machine unlearning (MU) aims to remove the influence of specific data from
trained models, addressing privacy concerns and ensuring compliance with
regulations such as the "right to be forgotten." Evaluating strong unlearning,
where the unlearned model is indistinguishable from one retrained without the
forgetting data, remains a significant challenge in deep neural networks
(DNNs). Common black-box metrics, such as variants of membership inference
attacks and accuracy comparisons, primarily assess model outputs but often fail
to capture residual information in intermediate layers. To bridge this gap, we
introduce the Information Difference Index (IDI), a novel white-box metric
inspired by information theory. IDI quantifies retained information in
intermediate features by measuring mutual information between those features
and the labels to be forgotten, offering a more comprehensive assessment of
unlearning efficacy. Our experiments demonstrate that IDI effectively measures
the degree of unlearning across various datasets and architectures, providing a
reliable tool for evaluating strong unlearning in DNNs. |
---|---|
DOI: | 10.48550/arxiv.2405.17878 |