Synthetically enhanced: unveiling synthetic data's potential in medical imaging research

Chest X-rays (CXR) are essential for diagnosing a variety of conditions, but when used on new populations, model generalizability issues limit their efficacy. Generative AI, particularly denoising diffusion probabilistic models (DDPMs), offers a promising approach to generating synthetic images, enh...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:EBioMedicine 2024-06, Vol.104, p.105174, Article 105174
Hauptverfasser: Khosravi, Bardia, Li, Frank, Dapamede, Theo, Rouzrokh, Pouria, Gamble, Cooper U., Trivedi, Hari M., Wyles, Cody C., Sellergren, Andrew B., Purkayastha, Saptarshi, Erickson, Bradley J., Gichoya, Judy W.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page 105174
container_title EBioMedicine
container_volume 104
creator Khosravi, Bardia
Li, Frank
Dapamede, Theo
Rouzrokh, Pouria
Gamble, Cooper U.
Trivedi, Hari M.
Wyles, Cody C.
Sellergren, Andrew B.
Purkayastha, Saptarshi
Erickson, Bradley J.
Gichoya, Judy W.
description Chest X-rays (CXR) are essential for diagnosing a variety of conditions, but when used on new populations, model generalizability issues limit their efficacy. Generative AI, particularly denoising diffusion probabilistic models (DDPMs), offers a promising approach to generating synthetic images, enhancing dataset diversity. This study investigates the impact of synthetic data supplementation on the performance and generalizability of medical imaging research. The study employed DDPMs to create synthetic CXRs conditioned on demographic and pathological characteristics from the CheXpert dataset. These synthetic images were used to supplement training datasets for pathology classifiers, with the aim of improving their performance. The evaluation involved three datasets (CheXpert, MIMIC-CXR, and Emory Chest X-ray) and various experiments, including supplementing real data with synthetic data, training with purely synthetic data, and mixing synthetic data with external datasets. Performance was assessed using the area under the receiver operating curve (AUROC). Adding synthetic data to real datasets resulted in a notable increase in AUROC values (up to 0.02 in internal and external test sets with 1000% supplementation, p-value
doi_str_mv 10.1016/j.ebiom.2024.105174
format Article
fullrecord <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_11177083</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S2352396424002093</els_id><sourcerecordid>3063464107</sourcerecordid><originalsourceid>FETCH-LOGICAL-c340t-72b70570cb74bfdf7bb6b86792536f138a3c8a87f18898e94a0087d03329340c3</originalsourceid><addsrcrecordid>eNp9kc1LKzEUxYM8UVH_AkFm996m9eZjJpkHIlL8AsGFCu5CJnOnTZlmajIt9L83Y1V04yrh5nfOPeQQckJhTIEWZ_MxVq5bjBkwkSY5lWKHHDCesxEvC_Hn232fHMc4BwCaizRUe2SfK8UoMHpAXh43vp9h76xp202Gfma8xfp_tvJrdK3z0yx-ElltevM3ZsuuR98702bOZwusB23mFmY60AEjmmBnR2S3MW3E44_zkDxfXz1Nbkf3Dzd3k8v7keUC-pFklYRcgq2kqJq6kVVVVKqQJct50VCuDLfKKNlQpUqFpTAAStbAOSuTgeWH5GLru1xVKYtNyYJp9TKkQGGjO-P0zxfvZnrarTWlVEpQPDn8-3AI3esKY68XLlpsW-OxW0XNoeCiEBRkQvkWtaGLMWDztYeCHnrRc_3eix560dtekur0e8QvzWcLCTjfApg-au0w6GgdDj24gLbXded-XfAG3sqgPg</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3063464107</pqid></control><display><type>article</type><title>Synthetically enhanced: unveiling synthetic data's potential in medical imaging research</title><source>DOAJ Directory of Open Access Journals</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><source>PubMed Central</source><source>Alma/SFX Local Collection</source><creator>Khosravi, Bardia ; Li, Frank ; Dapamede, Theo ; Rouzrokh, Pouria ; Gamble, Cooper U. ; Trivedi, Hari M. ; Wyles, Cody C. ; Sellergren, Andrew B. ; Purkayastha, Saptarshi ; Erickson, Bradley J. ; Gichoya, Judy W.</creator><creatorcontrib>Khosravi, Bardia ; Li, Frank ; Dapamede, Theo ; Rouzrokh, Pouria ; Gamble, Cooper U. ; Trivedi, Hari M. ; Wyles, Cody C. ; Sellergren, Andrew B. ; Purkayastha, Saptarshi ; Erickson, Bradley J. ; Gichoya, Judy W.</creatorcontrib><description>Chest X-rays (CXR) are essential for diagnosing a variety of conditions, but when used on new populations, model generalizability issues limit their efficacy. Generative AI, particularly denoising diffusion probabilistic models (DDPMs), offers a promising approach to generating synthetic images, enhancing dataset diversity. This study investigates the impact of synthetic data supplementation on the performance and generalizability of medical imaging research. The study employed DDPMs to create synthetic CXRs conditioned on demographic and pathological characteristics from the CheXpert dataset. These synthetic images were used to supplement training datasets for pathology classifiers, with the aim of improving their performance. The evaluation involved three datasets (CheXpert, MIMIC-CXR, and Emory Chest X-ray) and various experiments, including supplementing real data with synthetic data, training with purely synthetic data, and mixing synthetic data with external datasets. Performance was assessed using the area under the receiver operating curve (AUROC). Adding synthetic data to real datasets resulted in a notable increase in AUROC values (up to 0.02 in internal and external test sets with 1000% supplementation, p-value &lt;0.01 in all instances). When classifiers were trained exclusively on synthetic data, they achieved performance levels comparable to those trained on real data with 200%–300% data supplementation. The combination of real and synthetic data from different sources demonstrated enhanced model generalizability, increasing model AUROC from 0.76 to 0.80 on the internal test set (p-value &lt;0.01). Synthetic data supplementation significantly improves the performance and generalizability of pathology classifiers in medical imaging. Dr. Gichoya is a 2022 Robert Wood Johnson Foundation Harold Amos Medical Faculty Development Program and declares support from RSNA Health Disparities grant (#EIHD2204), Lacuna Fund (#67), Gordon and Betty Moore Foundation, NIH (NIBIB) MIDRC grant under contracts 75N92020C00008 and 75N92020C00021, and NHLBI Award Number R01HL167811.</description><identifier>ISSN: 2352-3964</identifier><identifier>EISSN: 2352-3964</identifier><identifier>DOI: 10.1016/j.ebiom.2024.105174</identifier><identifier>PMID: 38821021</identifier><language>eng</language><publisher>Netherlands: Elsevier B.V</publisher><subject>Chest radiographs ; Data supplementation ; Diffusion model ; Generative AI ; Synthetic data</subject><ispartof>EBioMedicine, 2024-06, Vol.104, p.105174, Article 105174</ispartof><rights>2024 The Author(s)</rights><rights>Copyright © 2024 The Author(s). Published by Elsevier B.V. All rights reserved.</rights><rights>2024 The Author(s) 2024</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c340t-72b70570cb74bfdf7bb6b86792536f138a3c8a87f18898e94a0087d03329340c3</cites><orcidid>0000-0002-8024-339X ; 0000-0002-1097-316X ; 0000-0002-8205-3397 ; 0000-0002-6264-2282 ; 0000-0002-8629-7567 ; 0000-0003-3625-534X ; 0009-0009-5139-4875</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC11177083/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC11177083/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,724,777,781,861,882,27905,27906,53772,53774</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/38821021$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Khosravi, Bardia</creatorcontrib><creatorcontrib>Li, Frank</creatorcontrib><creatorcontrib>Dapamede, Theo</creatorcontrib><creatorcontrib>Rouzrokh, Pouria</creatorcontrib><creatorcontrib>Gamble, Cooper U.</creatorcontrib><creatorcontrib>Trivedi, Hari M.</creatorcontrib><creatorcontrib>Wyles, Cody C.</creatorcontrib><creatorcontrib>Sellergren, Andrew B.</creatorcontrib><creatorcontrib>Purkayastha, Saptarshi</creatorcontrib><creatorcontrib>Erickson, Bradley J.</creatorcontrib><creatorcontrib>Gichoya, Judy W.</creatorcontrib><title>Synthetically enhanced: unveiling synthetic data's potential in medical imaging research</title><title>EBioMedicine</title><addtitle>EBioMedicine</addtitle><description>Chest X-rays (CXR) are essential for diagnosing a variety of conditions, but when used on new populations, model generalizability issues limit their efficacy. Generative AI, particularly denoising diffusion probabilistic models (DDPMs), offers a promising approach to generating synthetic images, enhancing dataset diversity. This study investigates the impact of synthetic data supplementation on the performance and generalizability of medical imaging research. The study employed DDPMs to create synthetic CXRs conditioned on demographic and pathological characteristics from the CheXpert dataset. These synthetic images were used to supplement training datasets for pathology classifiers, with the aim of improving their performance. The evaluation involved three datasets (CheXpert, MIMIC-CXR, and Emory Chest X-ray) and various experiments, including supplementing real data with synthetic data, training with purely synthetic data, and mixing synthetic data with external datasets. Performance was assessed using the area under the receiver operating curve (AUROC). Adding synthetic data to real datasets resulted in a notable increase in AUROC values (up to 0.02 in internal and external test sets with 1000% supplementation, p-value &lt;0.01 in all instances). When classifiers were trained exclusively on synthetic data, they achieved performance levels comparable to those trained on real data with 200%–300% data supplementation. The combination of real and synthetic data from different sources demonstrated enhanced model generalizability, increasing model AUROC from 0.76 to 0.80 on the internal test set (p-value &lt;0.01). Synthetic data supplementation significantly improves the performance and generalizability of pathology classifiers in medical imaging. Dr. Gichoya is a 2022 Robert Wood Johnson Foundation Harold Amos Medical Faculty Development Program and declares support from RSNA Health Disparities grant (#EIHD2204), Lacuna Fund (#67), Gordon and Betty Moore Foundation, NIH (NIBIB) MIDRC grant under contracts 75N92020C00008 and 75N92020C00021, and NHLBI Award Number R01HL167811.</description><subject>Chest radiographs</subject><subject>Data supplementation</subject><subject>Diffusion model</subject><subject>Generative AI</subject><subject>Synthetic data</subject><issn>2352-3964</issn><issn>2352-3964</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNp9kc1LKzEUxYM8UVH_AkFm996m9eZjJpkHIlL8AsGFCu5CJnOnTZlmajIt9L83Y1V04yrh5nfOPeQQckJhTIEWZ_MxVq5bjBkwkSY5lWKHHDCesxEvC_Hn232fHMc4BwCaizRUe2SfK8UoMHpAXh43vp9h76xp202Gfma8xfp_tvJrdK3z0yx-ElltevM3ZsuuR98702bOZwusB23mFmY60AEjmmBnR2S3MW3E44_zkDxfXz1Nbkf3Dzd3k8v7keUC-pFklYRcgq2kqJq6kVVVVKqQJct50VCuDLfKKNlQpUqFpTAAStbAOSuTgeWH5GLru1xVKYtNyYJp9TKkQGGjO-P0zxfvZnrarTWlVEpQPDn8-3AI3esKY68XLlpsW-OxW0XNoeCiEBRkQvkWtaGLMWDztYeCHnrRc_3eix560dtekur0e8QvzWcLCTjfApg-au0w6GgdDj24gLbXded-XfAG3sqgPg</recordid><startdate>20240601</startdate><enddate>20240601</enddate><creator>Khosravi, Bardia</creator><creator>Li, Frank</creator><creator>Dapamede, Theo</creator><creator>Rouzrokh, Pouria</creator><creator>Gamble, Cooper U.</creator><creator>Trivedi, Hari M.</creator><creator>Wyles, Cody C.</creator><creator>Sellergren, Andrew B.</creator><creator>Purkayastha, Saptarshi</creator><creator>Erickson, Bradley J.</creator><creator>Gichoya, Judy W.</creator><general>Elsevier B.V</general><general>Elsevier</general><scope>6I.</scope><scope>AAFTH</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>5PM</scope><orcidid>https://orcid.org/0000-0002-8024-339X</orcidid><orcidid>https://orcid.org/0000-0002-1097-316X</orcidid><orcidid>https://orcid.org/0000-0002-8205-3397</orcidid><orcidid>https://orcid.org/0000-0002-6264-2282</orcidid><orcidid>https://orcid.org/0000-0002-8629-7567</orcidid><orcidid>https://orcid.org/0000-0003-3625-534X</orcidid><orcidid>https://orcid.org/0009-0009-5139-4875</orcidid></search><sort><creationdate>20240601</creationdate><title>Synthetically enhanced: unveiling synthetic data's potential in medical imaging research</title><author>Khosravi, Bardia ; Li, Frank ; Dapamede, Theo ; Rouzrokh, Pouria ; Gamble, Cooper U. ; Trivedi, Hari M. ; Wyles, Cody C. ; Sellergren, Andrew B. ; Purkayastha, Saptarshi ; Erickson, Bradley J. ; Gichoya, Judy W.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c340t-72b70570cb74bfdf7bb6b86792536f138a3c8a87f18898e94a0087d03329340c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Chest radiographs</topic><topic>Data supplementation</topic><topic>Diffusion model</topic><topic>Generative AI</topic><topic>Synthetic data</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Khosravi, Bardia</creatorcontrib><creatorcontrib>Li, Frank</creatorcontrib><creatorcontrib>Dapamede, Theo</creatorcontrib><creatorcontrib>Rouzrokh, Pouria</creatorcontrib><creatorcontrib>Gamble, Cooper U.</creatorcontrib><creatorcontrib>Trivedi, Hari M.</creatorcontrib><creatorcontrib>Wyles, Cody C.</creatorcontrib><creatorcontrib>Sellergren, Andrew B.</creatorcontrib><creatorcontrib>Purkayastha, Saptarshi</creatorcontrib><creatorcontrib>Erickson, Bradley J.</creatorcontrib><creatorcontrib>Gichoya, Judy W.</creatorcontrib><collection>ScienceDirect Open Access Titles</collection><collection>Elsevier:ScienceDirect:Open Access</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>EBioMedicine</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Khosravi, Bardia</au><au>Li, Frank</au><au>Dapamede, Theo</au><au>Rouzrokh, Pouria</au><au>Gamble, Cooper U.</au><au>Trivedi, Hari M.</au><au>Wyles, Cody C.</au><au>Sellergren, Andrew B.</au><au>Purkayastha, Saptarshi</au><au>Erickson, Bradley J.</au><au>Gichoya, Judy W.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Synthetically enhanced: unveiling synthetic data's potential in medical imaging research</atitle><jtitle>EBioMedicine</jtitle><addtitle>EBioMedicine</addtitle><date>2024-06-01</date><risdate>2024</risdate><volume>104</volume><spage>105174</spage><pages>105174-</pages><artnum>105174</artnum><issn>2352-3964</issn><eissn>2352-3964</eissn><abstract>Chest X-rays (CXR) are essential for diagnosing a variety of conditions, but when used on new populations, model generalizability issues limit their efficacy. Generative AI, particularly denoising diffusion probabilistic models (DDPMs), offers a promising approach to generating synthetic images, enhancing dataset diversity. This study investigates the impact of synthetic data supplementation on the performance and generalizability of medical imaging research. The study employed DDPMs to create synthetic CXRs conditioned on demographic and pathological characteristics from the CheXpert dataset. These synthetic images were used to supplement training datasets for pathology classifiers, with the aim of improving their performance. The evaluation involved three datasets (CheXpert, MIMIC-CXR, and Emory Chest X-ray) and various experiments, including supplementing real data with synthetic data, training with purely synthetic data, and mixing synthetic data with external datasets. Performance was assessed using the area under the receiver operating curve (AUROC). Adding synthetic data to real datasets resulted in a notable increase in AUROC values (up to 0.02 in internal and external test sets with 1000% supplementation, p-value &lt;0.01 in all instances). When classifiers were trained exclusively on synthetic data, they achieved performance levels comparable to those trained on real data with 200%–300% data supplementation. The combination of real and synthetic data from different sources demonstrated enhanced model generalizability, increasing model AUROC from 0.76 to 0.80 on the internal test set (p-value &lt;0.01). Synthetic data supplementation significantly improves the performance and generalizability of pathology classifiers in medical imaging. Dr. Gichoya is a 2022 Robert Wood Johnson Foundation Harold Amos Medical Faculty Development Program and declares support from RSNA Health Disparities grant (#EIHD2204), Lacuna Fund (#67), Gordon and Betty Moore Foundation, NIH (NIBIB) MIDRC grant under contracts 75N92020C00008 and 75N92020C00021, and NHLBI Award Number R01HL167811.</abstract><cop>Netherlands</cop><pub>Elsevier B.V</pub><pmid>38821021</pmid><doi>10.1016/j.ebiom.2024.105174</doi><orcidid>https://orcid.org/0000-0002-8024-339X</orcidid><orcidid>https://orcid.org/0000-0002-1097-316X</orcidid><orcidid>https://orcid.org/0000-0002-8205-3397</orcidid><orcidid>https://orcid.org/0000-0002-6264-2282</orcidid><orcidid>https://orcid.org/0000-0002-8629-7567</orcidid><orcidid>https://orcid.org/0000-0003-3625-534X</orcidid><orcidid>https://orcid.org/0009-0009-5139-4875</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2352-3964
ispartof EBioMedicine, 2024-06, Vol.104, p.105174, Article 105174
issn 2352-3964
2352-3964
language eng
recordid cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_11177083
source DOAJ Directory of Open Access Journals; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals; PubMed Central; Alma/SFX Local Collection
subjects Chest radiographs
Data supplementation
Diffusion model
Generative AI
Synthetic data
title Synthetically enhanced: unveiling synthetic data's potential in medical imaging research
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-18T15%3A21%3A56IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Synthetically%20enhanced:%20unveiling%20synthetic%20data's%20potential%20in%20medical%20imaging%20research&rft.jtitle=EBioMedicine&rft.au=Khosravi,%20Bardia&rft.date=2024-06-01&rft.volume=104&rft.spage=105174&rft.pages=105174-&rft.artnum=105174&rft.issn=2352-3964&rft.eissn=2352-3964&rft_id=info:doi/10.1016/j.ebiom.2024.105174&rft_dat=%3Cproquest_pubme%3E3063464107%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3063464107&rft_id=info:pmid/38821021&rft_els_id=S2352396424002093&rfr_iscdi=true