Synthesizing Mixed-type Electronic Health Records using Diffusion Models
Electronic Health Records (EHRs) contain sensitive patient information, which presents privacy concerns when sharing such data. Synthetic data generation is a promising solution to mitigate these risks, often relying on deep generative models such as Generative Adversarial Networks (GANs). However,...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Electronic Health Records (EHRs) contain sensitive patient information, which
presents privacy concerns when sharing such data. Synthetic data generation is
a promising solution to mitigate these risks, often relying on deep generative
models such as Generative Adversarial Networks (GANs). However, recent studies
have shown that diffusion models offer several advantages over GANs, such as
generation of more realistic synthetic data and stable training in generating
data modalities, including image, text, and sound. In this work, we investigate
the potential of diffusion models for generating realistic mixed-type tabular
EHRs, comparing TabDDPM model with existing methods on four datasets in terms
of data quality, utility, privacy, and augmentation. Our experiments
demonstrate that TabDDPM outperforms the state-of-the-art models across all
evaluation metrics, except for privacy, which confirms the trade-off between
privacy and utility. |
---|---|
DOI: | 10.48550/arxiv.2302.14679 |