A linear optimization-based method for data privacy in statistical tabular data

National Statistical Agencies routinely disseminate large amount of data. Prior to dissemination these data have to be protected to avoid releasing confidential information. Controlled tabular adjustment (CTA) is one of the available methods for this purpose. CTA formulates an optimization problem t...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Optimization methods & software 2019-01, Vol.34 (1), p.37-61
Hauptverfasser: Castro, Jordi, González, José A.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 61
container_issue 1
container_start_page 37
container_title Optimization methods & software
container_volume 34
creator Castro, Jordi
González, José A.
description National Statistical Agencies routinely disseminate large amount of data. Prior to dissemination these data have to be protected to avoid releasing confidential information. Controlled tabular adjustment (CTA) is one of the available methods for this purpose. CTA formulates an optimization problem that looks for the safe table which is closest to the original one. The standard CTA approach results in a mixed integer linear optimization (MILO) problem, which is very challenging for current technology. In this work we present a much less costly variant of CTA that formulates a multiobjective linear optimization (LO) problem, where binary variables are pre-fixed, and the resulting continuous problem is solved by lexicographic optimization. Extensive computational results are reported using both commercial (CPLEX and XPRESS) and open source (Clp) solvers, with either simplex or interior-point methods, on a set of real instances. Most instances were successfully solved with the LO-CTA variant in less than one hour, while many of them are computationally very expensive with the MILO-CTA formulation. The interior-point method outperformed simplex in this particular application.
doi_str_mv 10.1080/10556788.2017.1332620
format Article
fullrecord <record><control><sourceid>proquest_XX2</sourceid><recordid>TN_cdi_csuc_recercat_oai_recercat_cat_2072_293149</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2161334613</sourcerecordid><originalsourceid>FETCH-LOGICAL-c427t-73df3a544e9696c471534f69b77700946bd7f6a6772a89e0cefeff0c5e0d68a63</originalsourceid><addsrcrecordid>eNp9kE1LAzEQhoMoqNWfIAQ8b518bLJ7U8QvKHjRc5hmE4xsNzVJlfrr3aUVbx5mJgPv-zJ5CLlgMGfQwBWDula6aeYcmJ4zIbjicEBOGPC2kq3Qh9O7rqtJdExOc34HAMmkOiHPN7QPg8NE47qEVfjGEuJQLTG7jq5ceYsd9THRDgvSdQqfaLc0DDSXUZhLsNjTgstNjzvNGTny2Gd3vp8z8np_93L7WC2eH55ubxaVlVyXSovOC6yldK1qlZWa1UJ61S611gCtVMtOe4VKa45N68A677wHWzvoVINKzAjb5dq8sSY565LFYiKGv2UqDpob3go2cpiRy51nneLHxuVi3uMmDeOZhjM1cpNTm5F6n5xizsl5M357hWlrGJiJt_nlbSbeZs979F3vfGEYga3wK6a-MwW3fUw-4WBDNuL_iB8PHoZh</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2161334613</pqid></control><display><type>article</type><title>A linear optimization-based method for data privacy in statistical tabular data</title><source>Recercat</source><creator>Castro, Jordi ; González, José A.</creator><creatorcontrib>Castro, Jordi ; González, José A.</creatorcontrib><description>National Statistical Agencies routinely disseminate large amount of data. Prior to dissemination these data have to be protected to avoid releasing confidential information. Controlled tabular adjustment (CTA) is one of the available methods for this purpose. CTA formulates an optimization problem that looks for the safe table which is closest to the original one. The standard CTA approach results in a mixed integer linear optimization (MILO) problem, which is very challenging for current technology. In this work we present a much less costly variant of CTA that formulates a multiobjective linear optimization (LO) problem, where binary variables are pre-fixed, and the resulting continuous problem is solved by lexicographic optimization. Extensive computational results are reported using both commercial (CPLEX and XPRESS) and open source (Clp) solvers, with either simplex or interior-point methods, on a set of real instances. Most instances were successfully solved with the LO-CTA variant in less than one hour, while many of them are computationally very expensive with the MILO-CTA formulation. The interior-point method outperformed simplex in this particular application.</description><identifier>ISSN: 1055-6788</identifier><identifier>EISSN: 1029-4937</identifier><identifier>DOI: 10.1080/10556788.2017.1332620</identifier><language>eng</language><publisher>Abingdon: Taylor &amp; Francis</publisher><subject>90 Operations research, mathematical programming ; 90C Mathematical programming ; benchmarking ; Classificació AMS ; data privacy ; Data science ; interior-point methods ; Investigació operativa ; lexicographic optimization ; linear optimization ; Matemàtiques i estadística ; Mixed integer ; Multiple objective analysis ; Optimization ; Solvers ; statistical disclosure control ; Tables (data) ; Àrees temàtiques de la UPC</subject><ispartof>Optimization methods &amp; software, 2019-01, Vol.34 (1), p.37-61</ispartof><rights>2017 Informa UK Limited, trading as Taylor &amp; Francis Group 2017</rights><rights>2017 Informa UK Limited, trading as Taylor &amp; Francis Group</rights><rights>Attribution-NonCommercial-NoDerivs 3.0 Spain info:eu-repo/semantics/openAccess &lt;a href="http://creativecommons.org/licenses/by-nc-nd/3.0/es/"&gt;http://creativecommons.org/licenses/by-nc-nd/3.0/es/&lt;/a&gt;</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c427t-73df3a544e9696c471534f69b77700946bd7f6a6772a89e0cefeff0c5e0d68a63</citedby><cites>FETCH-LOGICAL-c427t-73df3a544e9696c471534f69b77700946bd7f6a6772a89e0cefeff0c5e0d68a63</cites><orcidid>0000-0003-3573-4568</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>230,776,881,26951</link.rule.ids><linktorsrc>$$Uhttps://recercat.cat/handle/2072/293149$$EView_record_in_Consorci_de_Serveis_Universitaris_de_Catalunya_(CSUC)$$FView_record_in_$$GConsorci_de_Serveis_Universitaris_de_Catalunya_(CSUC)$$Hfree_for_read</linktorsrc></links><search><creatorcontrib>Castro, Jordi</creatorcontrib><creatorcontrib>González, José A.</creatorcontrib><title>A linear optimization-based method for data privacy in statistical tabular data</title><title>Optimization methods &amp; software</title><description>National Statistical Agencies routinely disseminate large amount of data. Prior to dissemination these data have to be protected to avoid releasing confidential information. Controlled tabular adjustment (CTA) is one of the available methods for this purpose. CTA formulates an optimization problem that looks for the safe table which is closest to the original one. The standard CTA approach results in a mixed integer linear optimization (MILO) problem, which is very challenging for current technology. In this work we present a much less costly variant of CTA that formulates a multiobjective linear optimization (LO) problem, where binary variables are pre-fixed, and the resulting continuous problem is solved by lexicographic optimization. Extensive computational results are reported using both commercial (CPLEX and XPRESS) and open source (Clp) solvers, with either simplex or interior-point methods, on a set of real instances. Most instances were successfully solved with the LO-CTA variant in less than one hour, while many of them are computationally very expensive with the MILO-CTA formulation. The interior-point method outperformed simplex in this particular application.</description><subject>90 Operations research, mathematical programming</subject><subject>90C Mathematical programming</subject><subject>benchmarking</subject><subject>Classificació AMS</subject><subject>data privacy</subject><subject>Data science</subject><subject>interior-point methods</subject><subject>Investigació operativa</subject><subject>lexicographic optimization</subject><subject>linear optimization</subject><subject>Matemàtiques i estadística</subject><subject>Mixed integer</subject><subject>Multiple objective analysis</subject><subject>Optimization</subject><subject>Solvers</subject><subject>statistical disclosure control</subject><subject>Tables (data)</subject><subject>Àrees temàtiques de la UPC</subject><issn>1055-6788</issn><issn>1029-4937</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><sourceid>XX2</sourceid><recordid>eNp9kE1LAzEQhoMoqNWfIAQ8b518bLJ7U8QvKHjRc5hmE4xsNzVJlfrr3aUVbx5mJgPv-zJ5CLlgMGfQwBWDula6aeYcmJ4zIbjicEBOGPC2kq3Qh9O7rqtJdExOc34HAMmkOiHPN7QPg8NE47qEVfjGEuJQLTG7jq5ceYsd9THRDgvSdQqfaLc0DDSXUZhLsNjTgstNjzvNGTny2Gd3vp8z8np_93L7WC2eH55ubxaVlVyXSovOC6yldK1qlZWa1UJ61S611gCtVMtOe4VKa45N68A677wHWzvoVINKzAjb5dq8sSY565LFYiKGv2UqDpob3go2cpiRy51nneLHxuVi3uMmDeOZhjM1cpNTm5F6n5xizsl5M357hWlrGJiJt_nlbSbeZs979F3vfGEYga3wK6a-MwW3fUw-4WBDNuL_iB8PHoZh</recordid><startdate>20190102</startdate><enddate>20190102</enddate><creator>Castro, Jordi</creator><creator>González, José A.</creator><general>Taylor &amp; Francis</general><general>Taylor &amp; Francis Ltd</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>XX2</scope><orcidid>https://orcid.org/0000-0003-3573-4568</orcidid></search><sort><creationdate>20190102</creationdate><title>A linear optimization-based method for data privacy in statistical tabular data</title><author>Castro, Jordi ; González, José A.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c427t-73df3a544e9696c471534f69b77700946bd7f6a6772a89e0cefeff0c5e0d68a63</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>90 Operations research, mathematical programming</topic><topic>90C Mathematical programming</topic><topic>benchmarking</topic><topic>Classificació AMS</topic><topic>data privacy</topic><topic>Data science</topic><topic>interior-point methods</topic><topic>Investigació operativa</topic><topic>lexicographic optimization</topic><topic>linear optimization</topic><topic>Matemàtiques i estadística</topic><topic>Mixed integer</topic><topic>Multiple objective analysis</topic><topic>Optimization</topic><topic>Solvers</topic><topic>statistical disclosure control</topic><topic>Tables (data)</topic><topic>Àrees temàtiques de la UPC</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Castro, Jordi</creatorcontrib><creatorcontrib>González, José A.</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Recercat</collection><jtitle>Optimization methods &amp; software</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Castro, Jordi</au><au>González, José A.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A linear optimization-based method for data privacy in statistical tabular data</atitle><jtitle>Optimization methods &amp; software</jtitle><date>2019-01-02</date><risdate>2019</risdate><volume>34</volume><issue>1</issue><spage>37</spage><epage>61</epage><pages>37-61</pages><issn>1055-6788</issn><eissn>1029-4937</eissn><abstract>National Statistical Agencies routinely disseminate large amount of data. Prior to dissemination these data have to be protected to avoid releasing confidential information. Controlled tabular adjustment (CTA) is one of the available methods for this purpose. CTA formulates an optimization problem that looks for the safe table which is closest to the original one. The standard CTA approach results in a mixed integer linear optimization (MILO) problem, which is very challenging for current technology. In this work we present a much less costly variant of CTA that formulates a multiobjective linear optimization (LO) problem, where binary variables are pre-fixed, and the resulting continuous problem is solved by lexicographic optimization. Extensive computational results are reported using both commercial (CPLEX and XPRESS) and open source (Clp) solvers, with either simplex or interior-point methods, on a set of real instances. Most instances were successfully solved with the LO-CTA variant in less than one hour, while many of them are computationally very expensive with the MILO-CTA formulation. The interior-point method outperformed simplex in this particular application.</abstract><cop>Abingdon</cop><pub>Taylor &amp; Francis</pub><doi>10.1080/10556788.2017.1332620</doi><tpages>25</tpages><orcidid>https://orcid.org/0000-0003-3573-4568</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1055-6788
ispartof Optimization methods & software, 2019-01, Vol.34 (1), p.37-61
issn 1055-6788
1029-4937
language eng
recordid cdi_csuc_recercat_oai_recercat_cat_2072_293149
source Recercat
subjects 90 Operations research, mathematical programming
90C Mathematical programming
benchmarking
Classificació AMS
data privacy
Data science
interior-point methods
Investigació operativa
lexicographic optimization
linear optimization
Matemàtiques i estadística
Mixed integer
Multiple objective analysis
Optimization
Solvers
statistical disclosure control
Tables (data)
Àrees temàtiques de la UPC
title A linear optimization-based method for data privacy in statistical tabular data
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-05T10%3A23%3A39IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_XX2&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20linear%20optimization-based%20method%20for%20data%20privacy%20in%20statistical%20tabular%20data&rft.jtitle=Optimization%20methods%20&%20software&rft.au=Castro,%20Jordi&rft.date=2019-01-02&rft.volume=34&rft.issue=1&rft.spage=37&rft.epage=61&rft.pages=37-61&rft.issn=1055-6788&rft.eissn=1029-4937&rft_id=info:doi/10.1080/10556788.2017.1332620&rft_dat=%3Cproquest_XX2%3E2161334613%3C/proquest_XX2%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2161334613&rft_id=info:pmid/&rfr_iscdi=true