Evaluating the Risk of Disclosure and Utility in a Synthetic Dataset

The advancement of information technology has improved the delivery of financial services by the introduction of Financial Technology (FinTech). To enhance their customer satisfaction, Fintech companies leverage artificial intelligence (AI) to collect fine-grained data about individuals, which enabl...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Computers, materials & continua materials & continua, 2021-01, Vol.68 (1), p.761-787
Hauptverfasser: Chen, Kang-Cheng, Yu, Chia-Mu, Dargahi, Tooska
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 787
container_issue 1
container_start_page 761
container_title Computers, materials & continua
container_volume 68
creator Chen, Kang-Cheng
Yu, Chia-Mu
Dargahi, Tooska
description The advancement of information technology has improved the delivery of financial services by the introduction of Financial Technology (FinTech). To enhance their customer satisfaction, Fintech companies leverage artificial intelligence (AI) to collect fine-grained data about individuals, which enables them to provide more intelligent and customized services. However, although visions thereof promise to make customers’ lives easier, they also raise major security and privacy concerns for their users. Differential privacy (DP) is a common privacy-preserving data publishing technique that is proved to ensure a high level of privacy preservation. However, an important concern arises from the trade-off between the data utility the risk of data disclosure (RoD), which has not been well investigated. In this paper, to address this challenge, we propose data-dependent approaches for evaluating whether the sufficient privacy is guaranteed in differentially private data release. At the same time, by taking into account the utility of the differentially private synthetic dataset, we present a data-dependent algorithm that, through a curve fitting technique, measures the error of the statistical result imposed to the original dataset due to the injection of random noise. Moreover, we also propose a method that ensures a proper privacy budget, i.e., will be chosen so as to maintain the trade-off between the privacy and utility. Our comprehensive experimental analysis proves both the efficiency and estimation accuracy of the proposed algorithms.
doi_str_mv 10.32604/cmc.2021.014984
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2507804739</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2507804739</sourcerecordid><originalsourceid>FETCH-LOGICAL-c266t-f49fd9b8bf71f616407f0d4d63faba0f6d51102476ad217560eb4063a82500123</originalsourceid><addsrcrecordid>eNpNkM9LwzAYhoMoOKd3jwHPrV9-9Gt7lHVOYSCoO4e0TTSza2eSCvvvrc6Dp_c9PLwvPIRcM0gFR5C3za5JOXCWApNlIU_IjGUSE845nv7r5-QihC2AQFHCjFTLL92NOrr-jcZ3Q59d-KCDpZULTTeE0Ruq-5ZuoutcPFDXU01fDv2ERtfQSkcdTLwkZ1Z3wVz95Zxs7pevi4dk_bR6XNytk4YjxsTK0rZlXdQ2ZxYZSsgttLJFYXWtwWKbMQZc5qhbzvIMwdQSUOiCZwCMizm5Oe7u_fA5mhDVdhh9P12qicgLkLkoJwqOVOOHELyxau_dTvuDYqB-XanJlfpxpY6uxDcJBVsZ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2507804739</pqid></control><display><type>article</type><title>Evaluating the Risk of Disclosure and Utility in a Synthetic Dataset</title><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><creator>Chen, Kang-Cheng ; Yu, Chia-Mu ; Dargahi, Tooska</creator><creatorcontrib>Chen, Kang-Cheng ; Yu, Chia-Mu ; Dargahi, Tooska</creatorcontrib><description>The advancement of information technology has improved the delivery of financial services by the introduction of Financial Technology (FinTech). To enhance their customer satisfaction, Fintech companies leverage artificial intelligence (AI) to collect fine-grained data about individuals, which enables them to provide more intelligent and customized services. However, although visions thereof promise to make customers’ lives easier, they also raise major security and privacy concerns for their users. Differential privacy (DP) is a common privacy-preserving data publishing technique that is proved to ensure a high level of privacy preservation. However, an important concern arises from the trade-off between the data utility the risk of data disclosure (RoD), which has not been well investigated. In this paper, to address this challenge, we propose data-dependent approaches for evaluating whether the sufficient privacy is guaranteed in differentially private data release. At the same time, by taking into account the utility of the differentially private synthetic dataset, we present a data-dependent algorithm that, through a curve fitting technique, measures the error of the statistical result imposed to the original dataset due to the injection of random noise. Moreover, we also propose a method that ensures a proper privacy budget, i.e., will be chosen so as to maintain the trade-off between the privacy and utility. Our comprehensive experimental analysis proves both the efficiency and estimation accuracy of the proposed algorithms.</description><identifier>ISSN: 1546-2226</identifier><identifier>ISSN: 1546-2218</identifier><identifier>EISSN: 1546-2226</identifier><identifier>DOI: 10.32604/cmc.2021.014984</identifier><language>eng</language><publisher>Henderson: Tech Science Press</publisher><subject>Algorithms ; Artificial intelligence ; Curve fitting ; Customer satisfaction ; Customer services ; Datasets ; Error analysis ; Evaluation ; Privacy ; Random noise ; Synthetic data ; Tradeoffs</subject><ispartof>Computers, materials &amp; continua, 2021-01, Vol.68 (1), p.761-787</ispartof><rights>2021. This work is licensed under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c266t-f49fd9b8bf71f616407f0d4d63faba0f6d51102476ad217560eb4063a82500123</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27901,27902</link.rule.ids></links><search><creatorcontrib>Chen, Kang-Cheng</creatorcontrib><creatorcontrib>Yu, Chia-Mu</creatorcontrib><creatorcontrib>Dargahi, Tooska</creatorcontrib><title>Evaluating the Risk of Disclosure and Utility in a Synthetic Dataset</title><title>Computers, materials &amp; continua</title><description>The advancement of information technology has improved the delivery of financial services by the introduction of Financial Technology (FinTech). To enhance their customer satisfaction, Fintech companies leverage artificial intelligence (AI) to collect fine-grained data about individuals, which enables them to provide more intelligent and customized services. However, although visions thereof promise to make customers’ lives easier, they also raise major security and privacy concerns for their users. Differential privacy (DP) is a common privacy-preserving data publishing technique that is proved to ensure a high level of privacy preservation. However, an important concern arises from the trade-off between the data utility the risk of data disclosure (RoD), which has not been well investigated. In this paper, to address this challenge, we propose data-dependent approaches for evaluating whether the sufficient privacy is guaranteed in differentially private data release. At the same time, by taking into account the utility of the differentially private synthetic dataset, we present a data-dependent algorithm that, through a curve fitting technique, measures the error of the statistical result imposed to the original dataset due to the injection of random noise. Moreover, we also propose a method that ensures a proper privacy budget, i.e., will be chosen so as to maintain the trade-off between the privacy and utility. Our comprehensive experimental analysis proves both the efficiency and estimation accuracy of the proposed algorithms.</description><subject>Algorithms</subject><subject>Artificial intelligence</subject><subject>Curve fitting</subject><subject>Customer satisfaction</subject><subject>Customer services</subject><subject>Datasets</subject><subject>Error analysis</subject><subject>Evaluation</subject><subject>Privacy</subject><subject>Random noise</subject><subject>Synthetic data</subject><subject>Tradeoffs</subject><issn>1546-2226</issn><issn>1546-2218</issn><issn>1546-2226</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNpNkM9LwzAYhoMoOKd3jwHPrV9-9Gt7lHVOYSCoO4e0TTSza2eSCvvvrc6Dp_c9PLwvPIRcM0gFR5C3za5JOXCWApNlIU_IjGUSE845nv7r5-QihC2AQFHCjFTLL92NOrr-jcZ3Q59d-KCDpZULTTeE0Ruq-5ZuoutcPFDXU01fDv2ERtfQSkcdTLwkZ1Z3wVz95Zxs7pevi4dk_bR6XNytk4YjxsTK0rZlXdQ2ZxYZSsgttLJFYXWtwWKbMQZc5qhbzvIMwdQSUOiCZwCMizm5Oe7u_fA5mhDVdhh9P12qicgLkLkoJwqOVOOHELyxau_dTvuDYqB-XanJlfpxpY6uxDcJBVsZ</recordid><startdate>20210101</startdate><enddate>20210101</enddate><creator>Chen, Kang-Cheng</creator><creator>Yu, Chia-Mu</creator><creator>Dargahi, Tooska</creator><general>Tech Science Press</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>PHGZM</scope><scope>PHGZT</scope><scope>PIMPY</scope><scope>PKEHL</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope></search><sort><creationdate>20210101</creationdate><title>Evaluating the Risk of Disclosure and Utility in a Synthetic Dataset</title><author>Chen, Kang-Cheng ; Yu, Chia-Mu ; Dargahi, Tooska</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c266t-f49fd9b8bf71f616407f0d4d63faba0f6d51102476ad217560eb4063a82500123</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Algorithms</topic><topic>Artificial intelligence</topic><topic>Curve fitting</topic><topic>Customer satisfaction</topic><topic>Customer services</topic><topic>Datasets</topic><topic>Error analysis</topic><topic>Evaluation</topic><topic>Privacy</topic><topic>Random noise</topic><topic>Synthetic data</topic><topic>Tradeoffs</topic><toplevel>online_resources</toplevel><creatorcontrib>Chen, Kang-Cheng</creatorcontrib><creatorcontrib>Yu, Chia-Mu</creatorcontrib><creatorcontrib>Dargahi, Tooska</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>ProQuest Central (New)</collection><collection>ProQuest One Academic (New)</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Middle East (New)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><jtitle>Computers, materials &amp; continua</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Chen, Kang-Cheng</au><au>Yu, Chia-Mu</au><au>Dargahi, Tooska</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Evaluating the Risk of Disclosure and Utility in a Synthetic Dataset</atitle><jtitle>Computers, materials &amp; continua</jtitle><date>2021-01-01</date><risdate>2021</risdate><volume>68</volume><issue>1</issue><spage>761</spage><epage>787</epage><pages>761-787</pages><issn>1546-2226</issn><issn>1546-2218</issn><eissn>1546-2226</eissn><abstract>The advancement of information technology has improved the delivery of financial services by the introduction of Financial Technology (FinTech). To enhance their customer satisfaction, Fintech companies leverage artificial intelligence (AI) to collect fine-grained data about individuals, which enables them to provide more intelligent and customized services. However, although visions thereof promise to make customers’ lives easier, they also raise major security and privacy concerns for their users. Differential privacy (DP) is a common privacy-preserving data publishing technique that is proved to ensure a high level of privacy preservation. However, an important concern arises from the trade-off between the data utility the risk of data disclosure (RoD), which has not been well investigated. In this paper, to address this challenge, we propose data-dependent approaches for evaluating whether the sufficient privacy is guaranteed in differentially private data release. At the same time, by taking into account the utility of the differentially private synthetic dataset, we present a data-dependent algorithm that, through a curve fitting technique, measures the error of the statistical result imposed to the original dataset due to the injection of random noise. Moreover, we also propose a method that ensures a proper privacy budget, i.e., will be chosen so as to maintain the trade-off between the privacy and utility. Our comprehensive experimental analysis proves both the efficiency and estimation accuracy of the proposed algorithms.</abstract><cop>Henderson</cop><pub>Tech Science Press</pub><doi>10.32604/cmc.2021.014984</doi><tpages>27</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1546-2226
ispartof Computers, materials & continua, 2021-01, Vol.68 (1), p.761-787
issn 1546-2226
1546-2218
1546-2226
language eng
recordid cdi_proquest_journals_2507804739
source Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals
subjects Algorithms
Artificial intelligence
Curve fitting
Customer satisfaction
Customer services
Datasets
Error analysis
Evaluation
Privacy
Random noise
Synthetic data
Tradeoffs
title Evaluating the Risk of Disclosure and Utility in a Synthetic Dataset
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-17T05%3A11%3A55IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Evaluating%20the%20Risk%20of%20Disclosure%20and%20Utility%20in%20a%20Synthetic%20Dataset&rft.jtitle=Computers,%20materials%20&%20continua&rft.au=Chen,%20Kang-Cheng&rft.date=2021-01-01&rft.volume=68&rft.issue=1&rft.spage=761&rft.epage=787&rft.pages=761-787&rft.issn=1546-2226&rft.eissn=1546-2226&rft_id=info:doi/10.32604/cmc.2021.014984&rft_dat=%3Cproquest_cross%3E2507804739%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2507804739&rft_id=info:pmid/&rfr_iscdi=true