Generalized bucketization scheme for flexible privacy settings

Bucketization is an anonymization technique for publishing sensitive data. The idea is to group records into small buckets to obscure the record-level association between sensitive information and identifying information. Compared to the traditional generalization technique, bucketization does not r...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Information sciences 2016-06, Vol.348, p.377-393
Hauptverfasser: Wang, Ke, Wang, Peng, Fu, Ada Waichee, Wong, Raymond Chi-Wing
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 393
container_issue
container_start_page 377
container_title Information sciences
container_volume 348
creator Wang, Ke
Wang, Peng
Fu, Ada Waichee
Wong, Raymond Chi-Wing
description Bucketization is an anonymization technique for publishing sensitive data. The idea is to group records into small buckets to obscure the record-level association between sensitive information and identifying information. Compared to the traditional generalization technique, bucketization does not require a taxonomy of attribute values, so is applicable to more data sets. A drawback of previous bucketization schemes is the uniform privacy setting and uniform bucket size, which often results in a non-achievable privacy goal or excessive information loss if sensitive values have variable sensitivity. In this work, we present a flexible bucketization scheme to address these issues. In the flexible scheme, each sensitive value can have its own privacy setting and buckets of different sizes can be formed. The challenge is to determine proper bucket sizes and group sensitive values into buckets so that the privacy setting of each sensitive value can be satisfied and overall information loss is minimized. We define the bucket setting problem to formalize this requirement. We present two efficient solutions to this problem. The first solution is optimal under the assumption that two different bucket sizes are allowed, and the second solution is heuristic without this assumption. We experimentally evaluate the effectiveness of this generalized bucketization scheme.
doi_str_mv 10.1016/j.ins.2016.01.100
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1808110106</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0020025516300421</els_id><sourcerecordid>1808110106</sourcerecordid><originalsourceid>FETCH-LOGICAL-c330t-d2cb08b7e7c8d43443e4b230defb345761a4565daee72c7a29bd66b3747ff5503</originalsourceid><addsrcrecordid>eNp9kM1LAzEQxYMoWKt_gLc9etk6-dhkiyBI0SoUvOg55GNWU7e7NdkW27_elHr2NMPjvcfMj5BrChMKVN4uJ6FLE5bXCdAswQkZ0VqxUrIpPSUjAAYlsKo6JxcpLQFAKClH5H6OHUbThj36wm7cFw5hb4bQd0Vyn7jCoulj0bT4E2yLxTqGrXG7IuEwhO4jXZKzxrQJr_7mmLw_Pb7NnsvF6_xl9rAoHecwlJ45C7VVqFztBReCo7CMg8fGclEpSY2oZOUNomJOGTa1XkrLlVBNU1XAx-Tm2LuO_fcG06BXITlsW9Nhv0ma1lDTDAJkttKj1cU-pYiNzkevTNxpCvrASi91ZqUPrDTQLB3q744ZzD9sA0adXMDOoQ8R3aB9H_5J_wIXcXIO</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1808110106</pqid></control><display><type>article</type><title>Generalized bucketization scheme for flexible privacy settings</title><source>Elsevier ScienceDirect Journals</source><creator>Wang, Ke ; Wang, Peng ; Fu, Ada Waichee ; Wong, Raymond Chi-Wing</creator><creatorcontrib>Wang, Ke ; Wang, Peng ; Fu, Ada Waichee ; Wong, Raymond Chi-Wing</creatorcontrib><description>Bucketization is an anonymization technique for publishing sensitive data. The idea is to group records into small buckets to obscure the record-level association between sensitive information and identifying information. Compared to the traditional generalization technique, bucketization does not require a taxonomy of attribute values, so is applicable to more data sets. A drawback of previous bucketization schemes is the uniform privacy setting and uniform bucket size, which often results in a non-achievable privacy goal or excessive information loss if sensitive values have variable sensitivity. In this work, we present a flexible bucketization scheme to address these issues. In the flexible scheme, each sensitive value can have its own privacy setting and buckets of different sizes can be formed. The challenge is to determine proper bucket sizes and group sensitive values into buckets so that the privacy setting of each sensitive value can be satisfied and overall information loss is minimized. We define the bucket setting problem to formalize this requirement. We present two efficient solutions to this problem. The first solution is optimal under the assumption that two different bucket sizes are allowed, and the second solution is heuristic without this assumption. We experimentally evaluate the effectiveness of this generalized bucketization scheme.</description><identifier>ISSN: 0020-0255</identifier><identifier>EISSN: 1872-6291</identifier><identifier>DOI: 10.1016/j.ins.2016.01.100</identifier><language>eng</language><publisher>Elsevier Inc</publisher><subject>Anonymity ; Bucketization ; Buckets ; Data publishing ; Disclosure ; Heuristic ; Optimization ; Privacy ; Taxonomy</subject><ispartof>Information sciences, 2016-06, Vol.348, p.377-393</ispartof><rights>2016 Elsevier Inc.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c330t-d2cb08b7e7c8d43443e4b230defb345761a4565daee72c7a29bd66b3747ff5503</citedby><cites>FETCH-LOGICAL-c330t-d2cb08b7e7c8d43443e4b230defb345761a4565daee72c7a29bd66b3747ff5503</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.sciencedirect.com/science/article/pii/S0020025516300421$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,776,780,3537,27901,27902,65306</link.rule.ids></links><search><creatorcontrib>Wang, Ke</creatorcontrib><creatorcontrib>Wang, Peng</creatorcontrib><creatorcontrib>Fu, Ada Waichee</creatorcontrib><creatorcontrib>Wong, Raymond Chi-Wing</creatorcontrib><title>Generalized bucketization scheme for flexible privacy settings</title><title>Information sciences</title><description>Bucketization is an anonymization technique for publishing sensitive data. The idea is to group records into small buckets to obscure the record-level association between sensitive information and identifying information. Compared to the traditional generalization technique, bucketization does not require a taxonomy of attribute values, so is applicable to more data sets. A drawback of previous bucketization schemes is the uniform privacy setting and uniform bucket size, which often results in a non-achievable privacy goal or excessive information loss if sensitive values have variable sensitivity. In this work, we present a flexible bucketization scheme to address these issues. In the flexible scheme, each sensitive value can have its own privacy setting and buckets of different sizes can be formed. The challenge is to determine proper bucket sizes and group sensitive values into buckets so that the privacy setting of each sensitive value can be satisfied and overall information loss is minimized. We define the bucket setting problem to formalize this requirement. We present two efficient solutions to this problem. The first solution is optimal under the assumption that two different bucket sizes are allowed, and the second solution is heuristic without this assumption. We experimentally evaluate the effectiveness of this generalized bucketization scheme.</description><subject>Anonymity</subject><subject>Bucketization</subject><subject>Buckets</subject><subject>Data publishing</subject><subject>Disclosure</subject><subject>Heuristic</subject><subject>Optimization</subject><subject>Privacy</subject><subject>Taxonomy</subject><issn>0020-0255</issn><issn>1872-6291</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2016</creationdate><recordtype>article</recordtype><recordid>eNp9kM1LAzEQxYMoWKt_gLc9etk6-dhkiyBI0SoUvOg55GNWU7e7NdkW27_elHr2NMPjvcfMj5BrChMKVN4uJ6FLE5bXCdAswQkZ0VqxUrIpPSUjAAYlsKo6JxcpLQFAKClH5H6OHUbThj36wm7cFw5hb4bQd0Vyn7jCoulj0bT4E2yLxTqGrXG7IuEwhO4jXZKzxrQJr_7mmLw_Pb7NnsvF6_xl9rAoHecwlJ45C7VVqFztBReCo7CMg8fGclEpSY2oZOUNomJOGTa1XkrLlVBNU1XAx-Tm2LuO_fcG06BXITlsW9Nhv0ma1lDTDAJkttKj1cU-pYiNzkevTNxpCvrASi91ZqUPrDTQLB3q744ZzD9sA0adXMDOoQ8R3aB9H_5J_wIXcXIO</recordid><startdate>20160620</startdate><enddate>20160620</enddate><creator>Wang, Ke</creator><creator>Wang, Peng</creator><creator>Fu, Ada Waichee</creator><creator>Wong, Raymond Chi-Wing</creator><general>Elsevier Inc</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20160620</creationdate><title>Generalized bucketization scheme for flexible privacy settings</title><author>Wang, Ke ; Wang, Peng ; Fu, Ada Waichee ; Wong, Raymond Chi-Wing</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c330t-d2cb08b7e7c8d43443e4b230defb345761a4565daee72c7a29bd66b3747ff5503</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2016</creationdate><topic>Anonymity</topic><topic>Bucketization</topic><topic>Buckets</topic><topic>Data publishing</topic><topic>Disclosure</topic><topic>Heuristic</topic><topic>Optimization</topic><topic>Privacy</topic><topic>Taxonomy</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Wang, Ke</creatorcontrib><creatorcontrib>Wang, Peng</creatorcontrib><creatorcontrib>Fu, Ada Waichee</creatorcontrib><creatorcontrib>Wong, Raymond Chi-Wing</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Information sciences</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Wang, Ke</au><au>Wang, Peng</au><au>Fu, Ada Waichee</au><au>Wong, Raymond Chi-Wing</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Generalized bucketization scheme for flexible privacy settings</atitle><jtitle>Information sciences</jtitle><date>2016-06-20</date><risdate>2016</risdate><volume>348</volume><spage>377</spage><epage>393</epage><pages>377-393</pages><issn>0020-0255</issn><eissn>1872-6291</eissn><abstract>Bucketization is an anonymization technique for publishing sensitive data. The idea is to group records into small buckets to obscure the record-level association between sensitive information and identifying information. Compared to the traditional generalization technique, bucketization does not require a taxonomy of attribute values, so is applicable to more data sets. A drawback of previous bucketization schemes is the uniform privacy setting and uniform bucket size, which often results in a non-achievable privacy goal or excessive information loss if sensitive values have variable sensitivity. In this work, we present a flexible bucketization scheme to address these issues. In the flexible scheme, each sensitive value can have its own privacy setting and buckets of different sizes can be formed. The challenge is to determine proper bucket sizes and group sensitive values into buckets so that the privacy setting of each sensitive value can be satisfied and overall information loss is minimized. We define the bucket setting problem to formalize this requirement. We present two efficient solutions to this problem. The first solution is optimal under the assumption that two different bucket sizes are allowed, and the second solution is heuristic without this assumption. We experimentally evaluate the effectiveness of this generalized bucketization scheme.</abstract><pub>Elsevier Inc</pub><doi>10.1016/j.ins.2016.01.100</doi><tpages>17</tpages></addata></record>
fulltext fulltext
identifier ISSN: 0020-0255
ispartof Information sciences, 2016-06, Vol.348, p.377-393
issn 0020-0255
1872-6291
language eng
recordid cdi_proquest_miscellaneous_1808110106
source Elsevier ScienceDirect Journals
subjects Anonymity
Bucketization
Buckets
Data publishing
Disclosure
Heuristic
Optimization
Privacy
Taxonomy
title Generalized bucketization scheme for flexible privacy settings
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-12T23%3A25%3A24IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Generalized%20bucketization%20scheme%20for%20flexible%20privacy%20settings&rft.jtitle=Information%20sciences&rft.au=Wang,%20Ke&rft.date=2016-06-20&rft.volume=348&rft.spage=377&rft.epage=393&rft.pages=377-393&rft.issn=0020-0255&rft.eissn=1872-6291&rft_id=info:doi/10.1016/j.ins.2016.01.100&rft_dat=%3Cproquest_cross%3E1808110106%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1808110106&rft_id=info:pmid/&rft_els_id=S0020025516300421&rfr_iscdi=true