rx-anon -- A Novel Approach on the De-Identification of Heterogeneous Data based on a Modified Mondrian Algorithm
Traditional approaches for data anonymization consider relational data and textual data independently. We propose rx-anon, an anonymization approach for heterogeneous semi-structured documents composed of relational and textual attributes. We map sensitive terms extracted from the text to the struct...
Gespeichert in:
Veröffentlicht in: | arXiv.org 2022-12 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | arXiv.org |
container_volume | |
creator | Singhofer, Fabian Garifullina, Aygul Kern, Mathias Scherp, Ansgar |
description | Traditional approaches for data anonymization consider relational data and textual data independently. We propose rx-anon, an anonymization approach for heterogeneous semi-structured documents composed of relational and textual attributes. We map sensitive terms extracted from the text to the structured data. This allows us to use concepts like k-anonymity to generate a joined, privacy-preserved version of the heterogeneous data input. We introduce the concept of redundant sensitive information to consistently anonymize the heterogeneous data. To control the influence of anonymization over unstructured textual data versus structured data attributes, we introduce a modified, parameterized Mondrian algorithm. The parameter \(\lambda\) allows to give different weight on the relational and textual attributes during the anonymization process. We evaluate our approach with two real-world datasets using a Normalized Certainty Penalty score, adapted to the problem of jointly anonymizing relational and textual data. The results show that our approach is capable of reducing information loss by using the tuning parameter to control the Mondrian partitioning while guaranteeing k-anonymity for relational attributes as well as for sensitive terms. As rx-anon is a framework approach, it can be reused and extended by other anonymization algorithms, privacy models, and textual similarity metrics. |
format | Article |
fullrecord | <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2529614976</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2529614976</sourcerecordid><originalsourceid>FETCH-proquest_journals_25296149763</originalsourceid><addsrcrecordid>eNqNjcEKwjAQRIMgWNR_WPAcqKmteixW0YOevMtqtzZSs22Sip9vBD_A0zAzb5iBiFSSzOVqodRITJ17xHGssqVK0yQSnX1LNGxASsjhxC9qIG9by3irIcS-JihIHkoyXlf6hl6HlCvYkyfLdzLEvYMCPcIVHZXfEcKRy0AHd2RTWo0G8ubOVvv6ORHDChtH05-OxWy3PW_2Mpx2PTl_eXBvTaguKlXrbL5YL7PkP-oD6CJJcw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2529614976</pqid></control><display><type>article</type><title>rx-anon -- A Novel Approach on the De-Identification of Heterogeneous Data based on a Modified Mondrian Algorithm</title><source>Freely Accessible Journals</source><creator>Singhofer, Fabian ; Garifullina, Aygul ; Kern, Mathias ; Scherp, Ansgar</creator><creatorcontrib>Singhofer, Fabian ; Garifullina, Aygul ; Kern, Mathias ; Scherp, Ansgar</creatorcontrib><description>Traditional approaches for data anonymization consider relational data and textual data independently. We propose rx-anon, an anonymization approach for heterogeneous semi-structured documents composed of relational and textual attributes. We map sensitive terms extracted from the text to the structured data. This allows us to use concepts like k-anonymity to generate a joined, privacy-preserved version of the heterogeneous data input. We introduce the concept of redundant sensitive information to consistently anonymize the heterogeneous data. To control the influence of anonymization over unstructured textual data versus structured data attributes, we introduce a modified, parameterized Mondrian algorithm. The parameter \(\lambda\) allows to give different weight on the relational and textual attributes during the anonymization process. We evaluate our approach with two real-world datasets using a Normalized Certainty Penalty score, adapted to the problem of jointly anonymizing relational and textual data. The results show that our approach is capable of reducing information loss by using the tuning parameter to control the Mondrian partitioning while guaranteeing k-anonymity for relational attributes as well as for sensitive terms. As rx-anon is a framework approach, it can be reused and extended by other anonymization algorithms, privacy models, and textual similarity metrics.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Algorithms ; Parameter modification ; Privacy ; Unstructured data</subject><ispartof>arXiv.org, 2022-12</ispartof><rights>2022. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>781,785</link.rule.ids></links><search><creatorcontrib>Singhofer, Fabian</creatorcontrib><creatorcontrib>Garifullina, Aygul</creatorcontrib><creatorcontrib>Kern, Mathias</creatorcontrib><creatorcontrib>Scherp, Ansgar</creatorcontrib><title>rx-anon -- A Novel Approach on the De-Identification of Heterogeneous Data based on a Modified Mondrian Algorithm</title><title>arXiv.org</title><description>Traditional approaches for data anonymization consider relational data and textual data independently. We propose rx-anon, an anonymization approach for heterogeneous semi-structured documents composed of relational and textual attributes. We map sensitive terms extracted from the text to the structured data. This allows us to use concepts like k-anonymity to generate a joined, privacy-preserved version of the heterogeneous data input. We introduce the concept of redundant sensitive information to consistently anonymize the heterogeneous data. To control the influence of anonymization over unstructured textual data versus structured data attributes, we introduce a modified, parameterized Mondrian algorithm. The parameter \(\lambda\) allows to give different weight on the relational and textual attributes during the anonymization process. We evaluate our approach with two real-world datasets using a Normalized Certainty Penalty score, adapted to the problem of jointly anonymizing relational and textual data. The results show that our approach is capable of reducing information loss by using the tuning parameter to control the Mondrian partitioning while guaranteeing k-anonymity for relational attributes as well as for sensitive terms. As rx-anon is a framework approach, it can be reused and extended by other anonymization algorithms, privacy models, and textual similarity metrics.</description><subject>Algorithms</subject><subject>Parameter modification</subject><subject>Privacy</subject><subject>Unstructured data</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNqNjcEKwjAQRIMgWNR_WPAcqKmteixW0YOevMtqtzZSs22Sip9vBD_A0zAzb5iBiFSSzOVqodRITJ17xHGssqVK0yQSnX1LNGxASsjhxC9qIG9by3irIcS-JihIHkoyXlf6hl6HlCvYkyfLdzLEvYMCPcIVHZXfEcKRy0AHd2RTWo0G8ubOVvv6ORHDChtH05-OxWy3PW_2Mpx2PTl_eXBvTaguKlXrbL5YL7PkP-oD6CJJcw</recordid><startdate>20221206</startdate><enddate>20221206</enddate><creator>Singhofer, Fabian</creator><creator>Garifullina, Aygul</creator><creator>Kern, Mathias</creator><creator>Scherp, Ansgar</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20221206</creationdate><title>rx-anon -- A Novel Approach on the De-Identification of Heterogeneous Data based on a Modified Mondrian Algorithm</title><author>Singhofer, Fabian ; Garifullina, Aygul ; Kern, Mathias ; Scherp, Ansgar</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_25296149763</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Algorithms</topic><topic>Parameter modification</topic><topic>Privacy</topic><topic>Unstructured data</topic><toplevel>online_resources</toplevel><creatorcontrib>Singhofer, Fabian</creatorcontrib><creatorcontrib>Garifullina, Aygul</creatorcontrib><creatorcontrib>Kern, Mathias</creatorcontrib><creatorcontrib>Scherp, Ansgar</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Singhofer, Fabian</au><au>Garifullina, Aygul</au><au>Kern, Mathias</au><au>Scherp, Ansgar</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>rx-anon -- A Novel Approach on the De-Identification of Heterogeneous Data based on a Modified Mondrian Algorithm</atitle><jtitle>arXiv.org</jtitle><date>2022-12-06</date><risdate>2022</risdate><eissn>2331-8422</eissn><abstract>Traditional approaches for data anonymization consider relational data and textual data independently. We propose rx-anon, an anonymization approach for heterogeneous semi-structured documents composed of relational and textual attributes. We map sensitive terms extracted from the text to the structured data. This allows us to use concepts like k-anonymity to generate a joined, privacy-preserved version of the heterogeneous data input. We introduce the concept of redundant sensitive information to consistently anonymize the heterogeneous data. To control the influence of anonymization over unstructured textual data versus structured data attributes, we introduce a modified, parameterized Mondrian algorithm. The parameter \(\lambda\) allows to give different weight on the relational and textual attributes during the anonymization process. We evaluate our approach with two real-world datasets using a Normalized Certainty Penalty score, adapted to the problem of jointly anonymizing relational and textual data. The results show that our approach is capable of reducing information loss by using the tuning parameter to control the Mondrian partitioning while guaranteeing k-anonymity for relational attributes as well as for sensitive terms. As rx-anon is a framework approach, it can be reused and extended by other anonymization algorithms, privacy models, and textual similarity metrics.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | EISSN: 2331-8422 |
ispartof | arXiv.org, 2022-12 |
issn | 2331-8422 |
language | eng |
recordid | cdi_proquest_journals_2529614976 |
source | Freely Accessible Journals |
subjects | Algorithms Parameter modification Privacy Unstructured data |
title | rx-anon -- A Novel Approach on the De-Identification of Heterogeneous Data based on a Modified Mondrian Algorithm |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-14T07%3A06%3A25IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=rx-anon%20--%20A%20Novel%20Approach%20on%20the%20De-Identification%20of%20Heterogeneous%20Data%20based%20on%20a%20Modified%20Mondrian%20Algorithm&rft.jtitle=arXiv.org&rft.au=Singhofer,%20Fabian&rft.date=2022-12-06&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2529614976%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2529614976&rft_id=info:pmid/&rfr_iscdi=true |