Protect and Extend -- Using GANs for Synthetic Data Generation of Time-Series Medical Records

Preservation of private user data is of paramount importance for high Quality of Experience (QoE) and acceptability, particularly with services treating sensitive data, such as IT-based health services. Whereas anonymization techniques were shown to be prone to data re-identification, synthetic data...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2024-03
Hauptverfasser:	Ashrafi, Navid, Schmitt, Vera, Spang, Robert P, Möller, Sebastian, Jan-Niklas Voigt-Antons
Format:	Artikel
Sprache:	eng
Schlagworte:	Acceptability Data integrity Generative adversarial networks Health services Leakage Medical records Prediction models Privacy Quality assessment Synthetic data Time series
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Ashrafi, Navid Schmitt, Vera Spang, Robert P Möller, Sebastian Jan-Niklas Voigt-Antons
description	Preservation of private user data is of paramount importance for high Quality of Experience (QoE) and acceptability, particularly with services treating sensitive data, such as IT-based health services. Whereas anonymization techniques were shown to be prone to data re-identification, synthetic data generation has gradually replaced anonymization since it is relatively less time and resource-consuming and more robust to data leakage. Generative Adversarial Networks (GANs) have been used for generating synthetic datasets, especially GAN frameworks adhering to the differential privacy phenomena. This research compares state-of-the-art GAN-based models for synthetic data generation to generate time-series synthetic medical records of dementia patients which can be distributed without privacy concerns. Predictive modeling, autocorrelation, and distribution analysis are used to assess the Quality of Generating (QoG) of the generated data. The privacy preservation of the respective models is assessed by applying membership inference attacks to determine potential data leakage risks. Our experiments indicate the superiority of the privacy-preserving GAN (PPGAN) model over other models regarding privacy preservation while maintaining an acceptable level of QoG. The presented results can support better data protection for medical use cases in the future.
format	Article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2931003471</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2931003471</sourcerecordid><originalsourceid>FETCH-proquest_journals_29310034713</originalsourceid><addsrcrecordid>eNqNyrsKwjAUgOEgCBbtOxxwDqSJWh3F66KIl1FKaE81pSaak4K-vR18AKdv-P8Oi6RSCZ-OpOyxmKgSQshJKsdjFbHrwbuAeQBtC1i9A7ZwDhcy9gab-Z6gdB5OHxvuGEwOSx00bNCi18E4C66Es3kgP6E3SLDDwuS6hiPmzhc0YN1S14Txzz4brlfnxZY_vXs1SCGrXONtmzI5U4kQapQm6r_rC_08Qeg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2931003471</pqid></control><display><type>article</type><title>Protect and Extend -- Using GANs for Synthetic Data Generation of Time-Series Medical Records</title><source>Free E- Journals</source><creator>Ashrafi, Navid ; Schmitt, Vera ; Spang, Robert P ; Möller, Sebastian ; Jan-Niklas Voigt-Antons</creator><creatorcontrib>Ashrafi, Navid ; Schmitt, Vera ; Spang, Robert P ; Möller, Sebastian ; Jan-Niklas Voigt-Antons</creatorcontrib><description>Preservation of private user data is of paramount importance for high Quality of Experience (QoE) and acceptability, particularly with services treating sensitive data, such as IT-based health services. Whereas anonymization techniques were shown to be prone to data re-identification, synthetic data generation has gradually replaced anonymization since it is relatively less time and resource-consuming and more robust to data leakage. Generative Adversarial Networks (GANs) have been used for generating synthetic datasets, especially GAN frameworks adhering to the differential privacy phenomena. This research compares state-of-the-art GAN-based models for synthetic data generation to generate time-series synthetic medical records of dementia patients which can be distributed without privacy concerns. Predictive modeling, autocorrelation, and distribution analysis are used to assess the Quality of Generating (QoG) of the generated data. The privacy preservation of the respective models is assessed by applying membership inference attacks to determine potential data leakage risks. Our experiments indicate the superiority of the privacy-preserving GAN (PPGAN) model over other models regarding privacy preservation while maintaining an acceptable level of QoG. The presented results can support better data protection for medical use cases in the future.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Acceptability ; Data integrity ; Generative adversarial networks ; Health services ; Leakage ; Medical records ; Prediction models ; Privacy ; Quality assessment ; Synthetic data ; Time series</subject><ispartof>arXiv.org, 2024-03</ispartof><rights>2024. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>776,780</link.rule.ids></links><search><creatorcontrib>Ashrafi, Navid</creatorcontrib><creatorcontrib>Schmitt, Vera</creatorcontrib><creatorcontrib>Spang, Robert P</creatorcontrib><creatorcontrib>Möller, Sebastian</creatorcontrib><creatorcontrib>Jan-Niklas Voigt-Antons</creatorcontrib><title>Protect and Extend -- Using GANs for Synthetic Data Generation of Time-Series Medical Records</title><title>arXiv.org</title><description>Preservation of private user data is of paramount importance for high Quality of Experience (QoE) and acceptability, particularly with services treating sensitive data, such as IT-based health services. Whereas anonymization techniques were shown to be prone to data re-identification, synthetic data generation has gradually replaced anonymization since it is relatively less time and resource-consuming and more robust to data leakage. Generative Adversarial Networks (GANs) have been used for generating synthetic datasets, especially GAN frameworks adhering to the differential privacy phenomena. This research compares state-of-the-art GAN-based models for synthetic data generation to generate time-series synthetic medical records of dementia patients which can be distributed without privacy concerns. Predictive modeling, autocorrelation, and distribution analysis are used to assess the Quality of Generating (QoG) of the generated data. The privacy preservation of the respective models is assessed by applying membership inference attacks to determine potential data leakage risks. Our experiments indicate the superiority of the privacy-preserving GAN (PPGAN) model over other models regarding privacy preservation while maintaining an acceptable level of QoG. The presented results can support better data protection for medical use cases in the future.</description><subject>Acceptability</subject><subject>Data integrity</subject><subject>Generative adversarial networks</subject><subject>Health services</subject><subject>Leakage</subject><subject>Medical records</subject><subject>Prediction models</subject><subject>Privacy</subject><subject>Quality assessment</subject><subject>Synthetic data</subject><subject>Time series</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNqNyrsKwjAUgOEgCBbtOxxwDqSJWh3F66KIl1FKaE81pSaak4K-vR18AKdv-P8Oi6RSCZ-OpOyxmKgSQshJKsdjFbHrwbuAeQBtC1i9A7ZwDhcy9gab-Z6gdB5OHxvuGEwOSx00bNCi18E4C66Es3kgP6E3SLDDwuS6hiPmzhc0YN1S14Txzz4brlfnxZY_vXs1SCGrXONtmzI5U4kQapQm6r_rC_08Qeg</recordid><startdate>20240301</startdate><enddate>20240301</enddate><creator>Ashrafi, Navid</creator><creator>Schmitt, Vera</creator><creator>Spang, Robert P</creator><creator>Möller, Sebastian</creator><creator>Jan-Niklas Voigt-Antons</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20240301</creationdate><title>Protect and Extend -- Using GANs for Synthetic Data Generation of Time-Series Medical Records</title><author>Ashrafi, Navid ; Schmitt, Vera ; Spang, Robert P ; Möller, Sebastian ; Jan-Niklas Voigt-Antons</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_29310034713</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Acceptability</topic><topic>Data integrity</topic><topic>Generative adversarial networks</topic><topic>Health services</topic><topic>Leakage</topic><topic>Medical records</topic><topic>Prediction models</topic><topic>Privacy</topic><topic>Quality assessment</topic><topic>Synthetic data</topic><topic>Time series</topic><toplevel>online_resources</toplevel><creatorcontrib>Ashrafi, Navid</creatorcontrib><creatorcontrib>Schmitt, Vera</creatorcontrib><creatorcontrib>Spang, Robert P</creatorcontrib><creatorcontrib>Möller, Sebastian</creatorcontrib><creatorcontrib>Jan-Niklas Voigt-Antons</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Ashrafi, Navid</au><au>Schmitt, Vera</au><au>Spang, Robert P</au><au>Möller, Sebastian</au><au>Jan-Niklas Voigt-Antons</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Protect and Extend -- Using GANs for Synthetic Data Generation of Time-Series Medical Records</atitle><jtitle>arXiv.org</jtitle><date>2024-03-01</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>Preservation of private user data is of paramount importance for high Quality of Experience (QoE) and acceptability, particularly with services treating sensitive data, such as IT-based health services. Whereas anonymization techniques were shown to be prone to data re-identification, synthetic data generation has gradually replaced anonymization since it is relatively less time and resource-consuming and more robust to data leakage. Generative Adversarial Networks (GANs) have been used for generating synthetic datasets, especially GAN frameworks adhering to the differential privacy phenomena. This research compares state-of-the-art GAN-based models for synthetic data generation to generate time-series synthetic medical records of dementia patients which can be distributed without privacy concerns. Predictive modeling, autocorrelation, and distribution analysis are used to assess the Quality of Generating (QoG) of the generated data. The privacy preservation of the respective models is assessed by applying membership inference attacks to determine potential data leakage risks. Our experiments indicate the superiority of the privacy-preserving GAN (PPGAN) model over other models regarding privacy preservation while maintaining an acceptable level of QoG. The presented results can support better data protection for medical use cases in the future.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2024-03
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_2931003471
source	Free E- Journals
subjects	Acceptability Data integrity Generative adversarial networks Health services Leakage Medical records Prediction models Privacy Quality assessment Synthetic data Time series
title	Protect and Extend -- Using GANs for Synthetic Data Generation of Time-Series Medical Records
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-30T21%3A50%3A25IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Protect%20and%20Extend%20--%20Using%20GANs%20for%20Synthetic%20Data%20Generation%20of%20Time-Series%20Medical%20Records&rft.jtitle=arXiv.org&rft.au=Ashrafi,%20Navid&rft.date=2024-03-01&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2931003471%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2931003471&rft_id=info:pmid/&rfr_iscdi=true