Machine Unlearning for Document Classification
Document understanding models have recently demonstrated remarkable performance by leveraging extensive collections of user documents. However, since documents often contain large amounts of personal data, their usage can pose a threat to user privacy and weaken the bonds of trust between humans and...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Document understanding models have recently demonstrated remarkable
performance by leveraging extensive collections of user documents. However,
since documents often contain large amounts of personal data, their usage can
pose a threat to user privacy and weaken the bonds of trust between humans and
AI services. In response to these concerns, legislation advocating ``the right
to be forgotten" has recently been proposed, allowing users to request the
removal of private information from computer systems and neural network models.
A novel approach, known as machine unlearning, has emerged to make AI models
forget about a particular class of data. In our research, we explore machine
unlearning for document classification problems, representing, to the best of
our knowledge, the first investigation into this area. Specifically, we
consider a realistic scenario where a remote server houses a well-trained model
and possesses only a small portion of training data. This setup is designed for
efficient forgetting manipulation. This work represents a pioneering step
towards the development of machine unlearning methods aimed at addressing
privacy concerns in document analysis applications. Our code is publicly
available at
\url{https://github.com/leitro/MachineUnlearning-DocClassification}. |
---|---|
DOI: | 10.48550/arxiv.2404.19031 |