Multi-Objective Optimization-Based Anonymization of Structured Data for Machine Learning
Data is essential for secondary use, but ensuring its privacy while allowing such use is a critical challenge. Various techniques have been proposed to address privacy concerns in data sharing and publishing. However, these methods often degrade data utility, impacting the performance of machine lea...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Data is essential for secondary use, but ensuring its privacy while allowing
such use is a critical challenge. Various techniques have been proposed to
address privacy concerns in data sharing and publishing. However, these methods
often degrade data utility, impacting the performance of machine learning (ML)
models. Our research identifies key limitations in existing optimization models
for privacy preservation, particularly in handling categorical variables,
assessing data utility, and evaluating effectiveness across diverse datasets.
We propose a novel multi-objective optimization model that simultaneously
minimizes information loss and maximizes protection against attacks. This model
is empirically validated using diverse datasets and compared with two existing
algorithms. We assess information loss, the number of individuals subject to
linkage or homogeneity attacks, and ML performance after anonymization. The
results indicate that our model achieves lower information loss and more
effectively mitigates the risk of attacks, reducing the number of individuals
susceptible to these attacks compared to alternative algorithms in some cases.
Additionally, our model maintains comparative ML performance relative to the
original data or data anonymized by other methods. Our findings highlight
significant improvements in privacy protection and ML model performance,
offering a comprehensive framework for balancing privacy and utility in data
sharing. |
---|---|
DOI: | 10.48550/arxiv.2501.01002 |