Estimation of cost of k–anonymity in the number of dummy records

De-identification is a process to prevent individuals from being identified from original transaction data by processing personal identification information. k -anonymization, which processes data so that at least k users have the same records, is one of the representative methods of de-identificati...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of ambient intelligence and humanized computing 2023-12, Vol.14 (12), p.15885-15894
Hauptverfasser: Ito, Satoshi, Kikuchi, Hiroaki
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:De-identification is a process to prevent individuals from being identified from original transaction data by processing personal identification information. k -anonymization, which processes data so that at least k users have the same records, is one of the representative methods of de-identification. One of the methods of k -anonymization is adding dummy records into the data to protect users who have unique histories. For this method, the cost for k -anonymization is the difference in the number of records between the original data and the processed data, and it can be calculated only after deciding the parameter k and processing data. However, we want to calculate the cost before processing and find the optimal value of k because processing the big data with various k is very costly. In this paper, we propose a new model of transaction data that gives us a probability distribution and an expected value of values in data under the assumption that all values occur independently with uniform probability. Applying our data model, it is possible to evaluate the cost of k -anonymized data even before processing.
ISSN:1868-5137
1868-5145
DOI:10.1007/s12652-021-03369-5