Delete My Account: Impact of Data Deletion on Machine Learning Classifiers
Proceedings of the 7th International Conference on Software Security and Assurance (ICSSA 2021), 2021, 7-20 Users are more aware than ever of the importance of their own data, thanks to reports about security breaches and leaks of private, often sensitive data in recent years. Additionally, the GDPR...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Proceedings of the 7th International Conference on Software
Security and Assurance (ICSSA 2021), 2021, 7-20 Users are more aware than ever of the importance of their own data, thanks to
reports about security breaches and leaks of private, often sensitive data in
recent years. Additionally, the GDPR has been in effect in the European Union
for over three years and many people have encountered its effects in one way or
another. Consequently, more and more users are actively protecting their
personal data. One way to do this is to make of the right to erasure guaranteed
in the GDPR, which has potential implications for a number of different fields,
such as big data and machine learning.
Our paper presents an in-depth analysis about the impact of the use of the
right to erasure on the performance of machine learning models on
classification tasks. We conduct various experiments utilising different
datasets as well as different machine learning algorithms to analyse a variety
of deletion behaviour scenarios. Due to the lack of credible data on actual
user behaviour, we make reasonable assumptions for various deletion modes and
biases and provide insight into the effects of different plausible scenarios
for right to erasure usage on data quality of machine learning. Our results
show that the impact depends strongly on the amount of data deleted, the
particular characteristics of the dataset and the bias chosen for deletion and
assumptions on user behaviour. |
---|---|
DOI: | 10.48550/arxiv.2311.10385 |