Synthea™ Novel coronavirus (COVID-19) model and synthetic data set

March through May 2020, a model of novel coronavirus (COVID-19) disease progression and treatment was constructed for the open-source Synthea patient simulation. The model was constructed using three peer-reviewed publications published in the early stages of the global pandemic, when less was known...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Intelligence-based medicine 2020-11, Vol.1-2, p.100007-100007, Article 100007
Hauptverfasser: Walonoski, Jason, Klaus, Sybil, Granger, Eldesia, Hall, Dylan, Gregorowicz, Andrew, Neyarapally, George, Watson, Abigail, Eastman, Jeff
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:March through May 2020, a model of novel coronavirus (COVID-19) disease progression and treatment was constructed for the open-source Synthea patient simulation. The model was constructed using three peer-reviewed publications published in the early stages of the global pandemic, when less was known, along with emerging resources, data, publications, and clinical knowledge. The simulation outputs synthetic Electronic Health Records (EHR), including the daily consumption of Personal Protective Equipment (PPE) and other medical devices and supplies. For this simulation, we generated 124,150 synthetic patients, with 88,166 infections and 18,177 hospitalized patients. Patient symptoms, disease severity, and morbidity outcomes were calibrated using clinical data from the peer-reviewed publications. 4.1% of all simulated infected patients died and 20.6% were hospitalized. At peak observation, 548 dialysis machines and 209 mechanical ventilators were needed. This simulation and the resulting data have been used for the development of algorithms and prototypes designed to address the current or future pandemics, and the model can continue to be refined to incorporate emerging COVID-19 knowledge, variations in patterns of care, and improvement in clinical outcomes. The resulting model, data, and analysis are available as open-source code on GitHub and an open-access data set is available for download. •The Synthea COVID-19 data set is a longitudinal set of synthetic COVID-19 patients and their EHR records.•This data has been used in several online challenges, hackathons, and conferences related to the COVID-19 global pandemic.•Synthetic data is secure and privacy preserving way to distribute realistic data to academic researchers, students, and practitioners suitable for a variety of use-cases.•Synthetic data aligns with the principles of open-science: open access, open source, open data.
ISSN:2666-5212
2666-5212
DOI:10.1016/j.ibmed.2020.100007