rx-anon -- A Novel Approach on the De-Identification of Heterogeneous Data based on a Modified Mondrian Algorithm
Traditional approaches for data anonymization consider relational data and textual data independently. We propose rx-anon, an anonymization approach for heterogeneous semi-structured documents composed of relational and textual attributes. We map sensitive terms extracted from the text to the struct...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Traditional approaches for data anonymization consider relational data and
textual data independently. We propose rx-anon, an anonymization approach for
heterogeneous semi-structured documents composed of relational and textual
attributes. We map sensitive terms extracted from the text to the structured
data. This allows us to use concepts like k-anonymity to generate a joined,
privacy-preserved version of the heterogeneous data input. We introduce the
concept of redundant sensitive information to consistently anonymize the
heterogeneous data. To control the influence of anonymization over unstructured
textual data versus structured data attributes, we introduce a modified,
parameterized Mondrian algorithm. The parameter $\lambda$ allows to give
different weight on the relational and textual attributes during the
anonymization process. We evaluate our approach with two real-world datasets
using a Normalized Certainty Penalty score, adapted to the problem of jointly
anonymizing relational and textual data. The results show that our approach is
capable of reducing information loss by using the tuning parameter to control
the Mondrian partitioning while guaranteeing k-anonymity for relational
attributes as well as for sensitive terms. As rx-anon is a framework approach,
it can be reused and extended by other anonymization algorithms, privacy
models, and textual similarity metrics. |
---|---|
DOI: | 10.48550/arxiv.2105.08842 |