rx-anon -- A Novel Approach on the De-Identification of Heterogeneous Data based on a Modified Mondrian Algorithm

Traditional approaches for data anonymization consider relational data and textual data independently. We propose rx-anon, an anonymization approach for heterogeneous semi-structured documents composed of relational and textual attributes. We map sensitive terms extracted from the text to the struct...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2022-12
Hauptverfasser:	Singhofer, Fabian, Garifullina, Aygul, Kern, Mathias, Scherp, Ansgar
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Parameter modification Privacy Unstructured data
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Singhofer, Fabian Garifullina, Aygul Kern, Mathias Scherp, Ansgar
description	Traditional approaches for data anonymization consider relational data and textual data independently. We propose rx-anon, an anonymization approach for heterogeneous semi-structured documents composed of relational and textual attributes. We map sensitive terms extracted from the text to the structured data. This allows us to use concepts like k-anonymity to generate a joined, privacy-preserved version of the heterogeneous data input. We introduce the concept of redundant sensitive information to consistently anonymize the heterogeneous data. To control the influence of anonymization over unstructured textual data versus structured data attributes, we introduce a modified, parameterized Mondrian algorithm. The parameter $\lambda$ allows to give different weight on the relational and textual attributes during the anonymization process. We evaluate our approach with two real-world datasets using a Normalized Certainty Penalty score, adapted to the problem of jointly anonymizing relational and textual data. The results show that our approach is capable of reducing information loss by using the tuning parameter to control the Mondrian partitioning while guaranteeing k-anonymity for relational attributes as well as for sensitive terms. As rx-anon is a framework approach, it can be reused and extended by other anonymization algorithms, privacy models, and textual similarity metrics.
format	Article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2529614976</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2529614976</sourcerecordid><originalsourceid>FETCH-proquest_journals_25296149763</originalsourceid><addsrcrecordid>eNqNjcEKwjAQRIMgWNR_WPAcqKmteixW0YOevMtqtzZSs22Sip9vBD_A0zAzb5iBiFSSzOVqodRITJ17xHGssqVK0yQSnX1LNGxASsjhxC9qIG9by3irIcS-JihIHkoyXlf6hl6HlCvYkyfLdzLEvYMCPcIVHZXfEcKRy0AHd2RTWo0G8ubOVvv6ORHDChtH05-OxWy3PW_2Mpx2PTl_eXBvTaguKlXrbL5YL7PkP-oD6CJJcw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2529614976</pqid></control><display><type>article</type><title>rx-anon -- A Novel Approach on the De-Identification of Heterogeneous Data based on a Modified Mondrian Algorithm</title><source>Freely Accessible Journals</source><creator>Singhofer, Fabian ; Garifullina, Aygul ; Kern, Mathias ; Scherp, Ansgar</creator><creatorcontrib>Singhofer, Fabian ; Garifullina, Aygul ; Kern, Mathias ; Scherp, Ansgar</creatorcontrib><description>Traditional approaches for data anonymization consider relational data and textual data independently. We propose rx-anon, an anonymization approach for heterogeneous semi-structured documents composed of relational and textual attributes. We map sensitive terms extracted from the text to the structured data. This allows us to use concepts like k-anonymity to generate a joined, privacy-preserved version of the heterogeneous data input. We introduce the concept of redundant sensitive information to consistently anonymize the heterogeneous data. To control the influence of anonymization over unstructured textual data versus structured data attributes, we introduce a modified, parameterized Mondrian algorithm. The parameter $\lambda$ allows to give different weight on the relational and textual attributes during the anonymization process. We evaluate our approach with two real-world datasets using a Normalized Certainty Penalty score, adapted to the problem of jointly anonymizing relational and textual data. The results show that our approach is capable of reducing information loss by using the tuning parameter to control the Mondrian partitioning while guaranteeing k-anonymity for relational attributes as well as for sensitive terms. As rx-anon is a framework approach, it can be reused and extended by other anonymization algorithms, privacy models, and textual similarity metrics.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Algorithms ; Parameter modification ; Privacy ; Unstructured data</subject><ispartof>arXiv.org, 2022-12</ispartof><rights>2022. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>781,785</link.rule.ids></links><search><creatorcontrib>Singhofer, Fabian</creatorcontrib><creatorcontrib>Garifullina, Aygul</creatorcontrib><creatorcontrib>Kern, Mathias</creatorcontrib><creatorcontrib>Scherp, Ansgar</creatorcontrib><title>rx-anon -- A Novel Approach on the De-Identification of Heterogeneous Data based on a Modified Mondrian Algorithm</title><title>arXiv.org</title><description>Traditional approaches for data anonymization consider relational data and textual data independently. We propose rx-anon, an anonymization approach for heterogeneous semi-structured documents composed of relational and textual attributes. We map sensitive terms extracted from the text to the structured data. This allows us to use concepts like k-anonymity to generate a joined, privacy-preserved version of the heterogeneous data input. We introduce the concept of redundant sensitive information to consistently anonymize the heterogeneous data. To control the influence of anonymization over unstructured textual data versus structured data attributes, we introduce a modified, parameterized Mondrian algorithm. The parameter $\lambda$ allows to give different weight on the relational and textual attributes during the anonymization process. We evaluate our approach with two real-world datasets using a Normalized Certainty Penalty score, adapted to the problem of jointly anonymizing relational and textual data. The results show that our approach is capable of reducing information loss by using the tuning parameter to control the Mondrian partitioning while guaranteeing k-anonymity for relational attributes as well as for sensitive terms. As rx-anon is a framework approach, it can be reused and extended by other anonymization algorithms, privacy models, and textual similarity metrics.</description><subject>Algorithms</subject><subject>Parameter modification</subject><subject>Privacy</subject><subject>Unstructured data</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNqNjcEKwjAQRIMgWNR_WPAcqKmteixW0YOevMtqtzZSs22Sip9vBD_A0zAzb5iBiFSSzOVqodRITJ17xHGssqVK0yQSnX1LNGxASsjhxC9qIG9by3irIcS-JihIHkoyXlf6hl6HlCvYkyfLdzLEvYMCPcIVHZXfEcKRy0AHd2RTWo0G8ubOVvv6ORHDChtH05-OxWy3PW_2Mpx2PTl_eXBvTaguKlXrbL5YL7PkP-oD6CJJcw</recordid><startdate>20221206</startdate><enddate>20221206</enddate><creator>Singhofer, Fabian</creator><creator>Garifullina, Aygul</creator><creator>Kern, Mathias</creator><creator>Scherp, Ansgar</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20221206</creationdate><title>rx-anon -- A Novel Approach on the De-Identification of Heterogeneous Data based on a Modified Mondrian Algorithm</title><author>Singhofer, Fabian ; Garifullina, Aygul ; Kern, Mathias ; Scherp, Ansgar</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_25296149763</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Algorithms</topic><topic>Parameter modification</topic><topic>Privacy</topic><topic>Unstructured data</topic><toplevel>online_resources</toplevel><creatorcontrib>Singhofer, Fabian</creatorcontrib><creatorcontrib>Garifullina, Aygul</creatorcontrib><creatorcontrib>Kern, Mathias</creatorcontrib><creatorcontrib>Scherp, Ansgar</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Singhofer, Fabian</au><au>Garifullina, Aygul</au><au>Kern, Mathias</au><au>Scherp, Ansgar</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>rx-anon -- A Novel Approach on the De-Identification of Heterogeneous Data based on a Modified Mondrian Algorithm</atitle><jtitle>arXiv.org</jtitle><date>2022-12-06</date><risdate>2022</risdate><eissn>2331-8422</eissn><abstract>Traditional approaches for data anonymization consider relational data and textual data independently. We propose rx-anon, an anonymization approach for heterogeneous semi-structured documents composed of relational and textual attributes. We map sensitive terms extracted from the text to the structured data. This allows us to use concepts like k-anonymity to generate a joined, privacy-preserved version of the heterogeneous data input. We introduce the concept of redundant sensitive information to consistently anonymize the heterogeneous data. To control the influence of anonymization over unstructured textual data versus structured data attributes, we introduce a modified, parameterized Mondrian algorithm. The parameter $\lambda$ allows to give different weight on the relational and textual attributes during the anonymization process. We evaluate our approach with two real-world datasets using a Normalized Certainty Penalty score, adapted to the problem of jointly anonymizing relational and textual data. The results show that our approach is capable of reducing information loss by using the tuning parameter to control the Mondrian partitioning while guaranteeing k-anonymity for relational attributes as well as for sensitive terms. As rx-anon is a framework approach, it can be reused and extended by other anonymization algorithms, privacy models, and textual similarity metrics.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2022-12
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_2529614976
source	Freely Accessible Journals
subjects	Algorithms Parameter modification Privacy Unstructured data
title	rx-anon -- A Novel Approach on the De-Identification of Heterogeneous Data based on a Modified Mondrian Algorithm
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-14T07%3A06%3A25IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=rx-anon%20--%20A%20Novel%20Approach%20on%20the%20De-Identification%20of%20Heterogeneous%20Data%20based%20on%20a%20Modified%20Mondrian%20Algorithm&rft.jtitle=arXiv.org&rft.au=Singhofer,%20Fabian&rft.date=2022-12-06&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2529614976%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2529614976&rft_id=info:pmid/&rfr_iscdi=true