Distributed L-diversity using spark-based algorithm for large resource description frameworks data
Privacy protection issues for resource description frameworks (RDFs) have emerged over the use of public government open data and the healthcare data of individuals. As these data may include personal information, they must undergo a de-identification process that deletes or replaces parts of the or...
Gespeichert in:
Veröffentlicht in: | The Journal of supercomputing 2021-07, Vol.77 (7), p.7270-7286 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Privacy protection issues for resource description frameworks (RDFs) have emerged over the use of public government open data and the healthcare data of individuals. As these data may include personal information, they must undergo a de-identification process that deletes or replaces parts of the original data. To enable these protections, a method has been developed to apply k-anonymization to RDF data. However, sensitive RDF information anonymized using k-anonymization is not completely secure and is vulnerable to attacks. In this paper, we propose an l-diversity anatomy de-identification method that can overcome the limitations of k-anonymity and guarantee stronger privacy protection than k-anonymization. Further, as this data anonymization process is computationally time-intensive, we use Spark distributed computing to provide rapid de-identification to enhance its utility. We also propose l-diversity preservation for dynamically evolving RDF datasets. Experimental results show that our proposed distributed l-diversity algorithm processes the data more efficiently than conventional approaches. |
---|---|
ISSN: | 0920-8542 1573-0484 |
DOI: | 10.1007/s11227-020-03583-6 |