MedSyn: Text-Guided Anatomy-Aware Synthesis of High-Fidelity 3-D CT Images

This paper introduces an innovative methodology for producing high-quality 3D lung CT images guided by textual information. While diffusion-based generative models are increasingly used in medical imaging, current state-of-the-art approaches are limited to low-resolution outputs and underutilize rad...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on medical imaging 2024-10, Vol.43 (10), p.3648-3660
Hauptverfasser: Xu, Yanwu, Sun, Li, Peng, Wei, Jia, Shuyue, Morrison, Katelyn, Perer, Adam, Zandifar, Afrooz, Visweswaran, Shyam, Eslami, Motahhare, Batmanghelich, Kayhan
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 3660
container_issue 10
container_start_page 3648
container_title IEEE transactions on medical imaging
container_volume 43
creator Xu, Yanwu
Sun, Li
Peng, Wei
Jia, Shuyue
Morrison, Katelyn
Perer, Adam
Zandifar, Afrooz
Visweswaran, Shyam
Eslami, Motahhare
Batmanghelich, Kayhan
description This paper introduces an innovative methodology for producing high-quality 3D lung CT images guided by textual information. While diffusion-based generative models are increasingly used in medical imaging, current state-of-the-art approaches are limited to low-resolution outputs and underutilize radiology reports' abundant information. The radiology reports can enhance the generation process by providing additional guidance and offering fine-grained control over the synthesis of images. Nevertheless, expanding text-guided generation to high-resolution 3D images poses significant memory and anatomical detail-preserving challenges. Addressing the memory issue, we introduce a hierarchical scheme that uses a modified UNet architecture. We start by synthesizing low-resolution images conditioned on the text, serving as a foundation for subsequent generators for complete volumetric data. To ensure the anatomical plausibility of the generated samples, we provide further guidance by generating vascular, airway, and lobular segmentation masks in conjunction with the CT images. The model demonstrates the capability to use textual input and segmentation tasks to generate synthesized images. Algorithmic comparative assessments and blind evaluations conducted by 10 board-certified radiologists indicate that our approach exhibits superior performance compared to the most advanced models based on GAN and diffusion techniques, especially in accurately retaining crucial anatomical features such as fissure lines and airways. This innovation introduces novel possibilities. This study focuses on two main objectives: (1) the development of a method for creating images based on textual prompts and anatomical components, and (2) the capability to generate new images conditioning on anatomical elements. The advancements in image generation can be applied to enhance numerous downstream tasks.
doi_str_mv 10.1109/TMI.2024.3415032
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_ieee_primary_10566053</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10566053</ieee_id><sourcerecordid>3070837230</sourcerecordid><originalsourceid>FETCH-LOGICAL-c203t-12d6cf65543e00fe7232c202791029faf339876ff1a6e05e78cc432f32694e3f3</originalsourceid><addsrcrecordid>eNpNkL9PwzAQhS0EoqWwMyCUkcXl7IuTmK0q9AdqxUCQ2KKQnNugpoE4EeS_x1ULYjrp3vfe8DF2KWAoBOjbeDkfSpD-EH2hAOUR6wulIi6V_3rM-iDDiAMEssfOrH0HEL4Cfcp6GGn3FrrPHpeUP3fbOy-m74ZP2yKn3Btt06YqOz76SmvyXNysyRbWq4w3K1ZrPnHUpmg6D_m9N469eZmuyJ6zE5NuLF0c7oC9TB7i8Ywvnqbz8WjBMwnYcCHzIDOBUj4SgKFQonSJDLUAqU1qEHUUBsaINCBQFEZZ5qM0KAPtExocsJv97kddfbZkm6QsbEabTbqlqrUJQggRullwKOzRrK6srckkH3VRpnWXCEh2BhNnMNkZTA4GXeX6sN6-lZT_FX6VOeBqDxRE9G9PBQEoxB9h-nGx</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3070837230</pqid></control><display><type>article</type><title>MedSyn: Text-Guided Anatomy-Aware Synthesis of High-Fidelity 3-D CT Images</title><source>IEEE Electronic Library (IEL)</source><creator>Xu, Yanwu ; Sun, Li ; Peng, Wei ; Jia, Shuyue ; Morrison, Katelyn ; Perer, Adam ; Zandifar, Afrooz ; Visweswaran, Shyam ; Eslami, Motahhare ; Batmanghelich, Kayhan</creator><creatorcontrib>Xu, Yanwu ; Sun, Li ; Peng, Wei ; Jia, Shuyue ; Morrison, Katelyn ; Perer, Adam ; Zandifar, Afrooz ; Visweswaran, Shyam ; Eslami, Motahhare ; Batmanghelich, Kayhan</creatorcontrib><description>This paper introduces an innovative methodology for producing high-quality 3D lung CT images guided by textual information. While diffusion-based generative models are increasingly used in medical imaging, current state-of-the-art approaches are limited to low-resolution outputs and underutilize radiology reports' abundant information. The radiology reports can enhance the generation process by providing additional guidance and offering fine-grained control over the synthesis of images. Nevertheless, expanding text-guided generation to high-resolution 3D images poses significant memory and anatomical detail-preserving challenges. Addressing the memory issue, we introduce a hierarchical scheme that uses a modified UNet architecture. We start by synthesizing low-resolution images conditioned on the text, serving as a foundation for subsequent generators for complete volumetric data. To ensure the anatomical plausibility of the generated samples, we provide further guidance by generating vascular, airway, and lobular segmentation masks in conjunction with the CT images. The model demonstrates the capability to use textual input and segmentation tasks to generate synthesized images. Algorithmic comparative assessments and blind evaluations conducted by 10 board-certified radiologists indicate that our approach exhibits superior performance compared to the most advanced models based on GAN and diffusion techniques, especially in accurately retaining crucial anatomical features such as fissure lines and airways. This innovation introduces novel possibilities. This study focuses on two main objectives: (1) the development of a method for creating images based on textual prompts and anatomical components, and (2) the capability to generate new images conditioning on anatomical elements. The advancements in image generation can be applied to enhance numerous downstream tasks.</description><identifier>ISSN: 0278-0062</identifier><identifier>ISSN: 1558-254X</identifier><identifier>EISSN: 1558-254X</identifier><identifier>DOI: 10.1109/TMI.2024.3415032</identifier><identifier>PMID: 38900619</identifier><identifier>CODEN: ITMID4</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>3D image generation ; Algorithms ; Atmospheric modeling ; Biomedical imaging ; Computed tomography ; controllable synthesis ; Diffusion model ; Humans ; Image synthesis ; Imaging, Three-Dimensional - methods ; Lung ; Lung - diagnostic imaging ; lung CT ; Radiology ; text-guided image generation ; Three-dimensional displays ; Tomography, X-Ray Computed - methods ; volume synthesis with radiology report</subject><ispartof>IEEE transactions on medical imaging, 2024-10, Vol.43 (10), p.3648-3660</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c203t-12d6cf65543e00fe7232c202791029faf339876ff1a6e05e78cc432f32694e3f3</cites><orcidid>0000-0001-9893-9136 ; 0000-0001-5676-012X ; 0000-0002-8369-3847 ; 0000-0002-2079-8684 ; 0000-0002-5809-1318 ; 0000-0002-1499-3045 ; 0000-0002-2892-5764</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10566053$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,777,781,793,27905,27906,54739</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10566053$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/38900619$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Xu, Yanwu</creatorcontrib><creatorcontrib>Sun, Li</creatorcontrib><creatorcontrib>Peng, Wei</creatorcontrib><creatorcontrib>Jia, Shuyue</creatorcontrib><creatorcontrib>Morrison, Katelyn</creatorcontrib><creatorcontrib>Perer, Adam</creatorcontrib><creatorcontrib>Zandifar, Afrooz</creatorcontrib><creatorcontrib>Visweswaran, Shyam</creatorcontrib><creatorcontrib>Eslami, Motahhare</creatorcontrib><creatorcontrib>Batmanghelich, Kayhan</creatorcontrib><title>MedSyn: Text-Guided Anatomy-Aware Synthesis of High-Fidelity 3-D CT Images</title><title>IEEE transactions on medical imaging</title><addtitle>TMI</addtitle><addtitle>IEEE Trans Med Imaging</addtitle><description>This paper introduces an innovative methodology for producing high-quality 3D lung CT images guided by textual information. While diffusion-based generative models are increasingly used in medical imaging, current state-of-the-art approaches are limited to low-resolution outputs and underutilize radiology reports' abundant information. The radiology reports can enhance the generation process by providing additional guidance and offering fine-grained control over the synthesis of images. Nevertheless, expanding text-guided generation to high-resolution 3D images poses significant memory and anatomical detail-preserving challenges. Addressing the memory issue, we introduce a hierarchical scheme that uses a modified UNet architecture. We start by synthesizing low-resolution images conditioned on the text, serving as a foundation for subsequent generators for complete volumetric data. To ensure the anatomical plausibility of the generated samples, we provide further guidance by generating vascular, airway, and lobular segmentation masks in conjunction with the CT images. The model demonstrates the capability to use textual input and segmentation tasks to generate synthesized images. Algorithmic comparative assessments and blind evaluations conducted by 10 board-certified radiologists indicate that our approach exhibits superior performance compared to the most advanced models based on GAN and diffusion techniques, especially in accurately retaining crucial anatomical features such as fissure lines and airways. This innovation introduces novel possibilities. This study focuses on two main objectives: (1) the development of a method for creating images based on textual prompts and anatomical components, and (2) the capability to generate new images conditioning on anatomical elements. The advancements in image generation can be applied to enhance numerous downstream tasks.</description><subject>3D image generation</subject><subject>Algorithms</subject><subject>Atmospheric modeling</subject><subject>Biomedical imaging</subject><subject>Computed tomography</subject><subject>controllable synthesis</subject><subject>Diffusion model</subject><subject>Humans</subject><subject>Image synthesis</subject><subject>Imaging, Three-Dimensional - methods</subject><subject>Lung</subject><subject>Lung - diagnostic imaging</subject><subject>lung CT</subject><subject>Radiology</subject><subject>text-guided image generation</subject><subject>Three-dimensional displays</subject><subject>Tomography, X-Ray Computed - methods</subject><subject>volume synthesis with radiology report</subject><issn>0278-0062</issn><issn>1558-254X</issn><issn>1558-254X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><sourceid>EIF</sourceid><recordid>eNpNkL9PwzAQhS0EoqWwMyCUkcXl7IuTmK0q9AdqxUCQ2KKQnNugpoE4EeS_x1ULYjrp3vfe8DF2KWAoBOjbeDkfSpD-EH2hAOUR6wulIi6V_3rM-iDDiAMEssfOrH0HEL4Cfcp6GGn3FrrPHpeUP3fbOy-m74ZP2yKn3Btt06YqOz76SmvyXNysyRbWq4w3K1ZrPnHUpmg6D_m9N469eZmuyJ6zE5NuLF0c7oC9TB7i8Ywvnqbz8WjBMwnYcCHzIDOBUj4SgKFQonSJDLUAqU1qEHUUBsaINCBQFEZZ5qM0KAPtExocsJv97kddfbZkm6QsbEabTbqlqrUJQggRullwKOzRrK6srckkH3VRpnWXCEh2BhNnMNkZTA4GXeX6sN6-lZT_FX6VOeBqDxRE9G9PBQEoxB9h-nGx</recordid><startdate>202410</startdate><enddate>202410</enddate><creator>Xu, Yanwu</creator><creator>Sun, Li</creator><creator>Peng, Wei</creator><creator>Jia, Shuyue</creator><creator>Morrison, Katelyn</creator><creator>Perer, Adam</creator><creator>Zandifar, Afrooz</creator><creator>Visweswaran, Shyam</creator><creator>Eslami, Motahhare</creator><creator>Batmanghelich, Kayhan</creator><general>IEEE</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0001-9893-9136</orcidid><orcidid>https://orcid.org/0000-0001-5676-012X</orcidid><orcidid>https://orcid.org/0000-0002-8369-3847</orcidid><orcidid>https://orcid.org/0000-0002-2079-8684</orcidid><orcidid>https://orcid.org/0000-0002-5809-1318</orcidid><orcidid>https://orcid.org/0000-0002-1499-3045</orcidid><orcidid>https://orcid.org/0000-0002-2892-5764</orcidid></search><sort><creationdate>202410</creationdate><title>MedSyn: Text-Guided Anatomy-Aware Synthesis of High-Fidelity 3-D CT Images</title><author>Xu, Yanwu ; Sun, Li ; Peng, Wei ; Jia, Shuyue ; Morrison, Katelyn ; Perer, Adam ; Zandifar, Afrooz ; Visweswaran, Shyam ; Eslami, Motahhare ; Batmanghelich, Kayhan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c203t-12d6cf65543e00fe7232c202791029faf339876ff1a6e05e78cc432f32694e3f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>3D image generation</topic><topic>Algorithms</topic><topic>Atmospheric modeling</topic><topic>Biomedical imaging</topic><topic>Computed tomography</topic><topic>controllable synthesis</topic><topic>Diffusion model</topic><topic>Humans</topic><topic>Image synthesis</topic><topic>Imaging, Three-Dimensional - methods</topic><topic>Lung</topic><topic>Lung - diagnostic imaging</topic><topic>lung CT</topic><topic>Radiology</topic><topic>text-guided image generation</topic><topic>Three-dimensional displays</topic><topic>Tomography, X-Ray Computed - methods</topic><topic>volume synthesis with radiology report</topic><toplevel>online_resources</toplevel><creatorcontrib>Xu, Yanwu</creatorcontrib><creatorcontrib>Sun, Li</creatorcontrib><creatorcontrib>Peng, Wei</creatorcontrib><creatorcontrib>Jia, Shuyue</creatorcontrib><creatorcontrib>Morrison, Katelyn</creatorcontrib><creatorcontrib>Perer, Adam</creatorcontrib><creatorcontrib>Zandifar, Afrooz</creatorcontrib><creatorcontrib>Visweswaran, Shyam</creatorcontrib><creatorcontrib>Eslami, Motahhare</creatorcontrib><creatorcontrib>Batmanghelich, Kayhan</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE transactions on medical imaging</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Xu, Yanwu</au><au>Sun, Li</au><au>Peng, Wei</au><au>Jia, Shuyue</au><au>Morrison, Katelyn</au><au>Perer, Adam</au><au>Zandifar, Afrooz</au><au>Visweswaran, Shyam</au><au>Eslami, Motahhare</au><au>Batmanghelich, Kayhan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>MedSyn: Text-Guided Anatomy-Aware Synthesis of High-Fidelity 3-D CT Images</atitle><jtitle>IEEE transactions on medical imaging</jtitle><stitle>TMI</stitle><addtitle>IEEE Trans Med Imaging</addtitle><date>2024-10</date><risdate>2024</risdate><volume>43</volume><issue>10</issue><spage>3648</spage><epage>3660</epage><pages>3648-3660</pages><issn>0278-0062</issn><issn>1558-254X</issn><eissn>1558-254X</eissn><coden>ITMID4</coden><abstract>This paper introduces an innovative methodology for producing high-quality 3D lung CT images guided by textual information. While diffusion-based generative models are increasingly used in medical imaging, current state-of-the-art approaches are limited to low-resolution outputs and underutilize radiology reports' abundant information. The radiology reports can enhance the generation process by providing additional guidance and offering fine-grained control over the synthesis of images. Nevertheless, expanding text-guided generation to high-resolution 3D images poses significant memory and anatomical detail-preserving challenges. Addressing the memory issue, we introduce a hierarchical scheme that uses a modified UNet architecture. We start by synthesizing low-resolution images conditioned on the text, serving as a foundation for subsequent generators for complete volumetric data. To ensure the anatomical plausibility of the generated samples, we provide further guidance by generating vascular, airway, and lobular segmentation masks in conjunction with the CT images. The model demonstrates the capability to use textual input and segmentation tasks to generate synthesized images. Algorithmic comparative assessments and blind evaluations conducted by 10 board-certified radiologists indicate that our approach exhibits superior performance compared to the most advanced models based on GAN and diffusion techniques, especially in accurately retaining crucial anatomical features such as fissure lines and airways. This innovation introduces novel possibilities. This study focuses on two main objectives: (1) the development of a method for creating images based on textual prompts and anatomical components, and (2) the capability to generate new images conditioning on anatomical elements. The advancements in image generation can be applied to enhance numerous downstream tasks.</abstract><cop>United States</cop><pub>IEEE</pub><pmid>38900619</pmid><doi>10.1109/TMI.2024.3415032</doi><tpages>13</tpages><orcidid>https://orcid.org/0000-0001-9893-9136</orcidid><orcidid>https://orcid.org/0000-0001-5676-012X</orcidid><orcidid>https://orcid.org/0000-0002-8369-3847</orcidid><orcidid>https://orcid.org/0000-0002-2079-8684</orcidid><orcidid>https://orcid.org/0000-0002-5809-1318</orcidid><orcidid>https://orcid.org/0000-0002-1499-3045</orcidid><orcidid>https://orcid.org/0000-0002-2892-5764</orcidid></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 0278-0062
ispartof IEEE transactions on medical imaging, 2024-10, Vol.43 (10), p.3648-3660
issn 0278-0062
1558-254X
1558-254X
language eng
recordid cdi_ieee_primary_10566053
source IEEE Electronic Library (IEL)
subjects 3D image generation
Algorithms
Atmospheric modeling
Biomedical imaging
Computed tomography
controllable synthesis
Diffusion model
Humans
Image synthesis
Imaging, Three-Dimensional - methods
Lung
Lung - diagnostic imaging
lung CT
Radiology
text-guided image generation
Three-dimensional displays
Tomography, X-Ray Computed - methods
volume synthesis with radiology report
title MedSyn: Text-Guided Anatomy-Aware Synthesis of High-Fidelity 3-D CT Images
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-20T03%3A28%3A12IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=MedSyn:%20Text-Guided%20Anatomy-Aware%20Synthesis%20of%20High-Fidelity%203-D%20CT%20Images&rft.jtitle=IEEE%20transactions%20on%20medical%20imaging&rft.au=Xu,%20Yanwu&rft.date=2024-10&rft.volume=43&rft.issue=10&rft.spage=3648&rft.epage=3660&rft.pages=3648-3660&rft.issn=0278-0062&rft.eissn=1558-254X&rft.coden=ITMID4&rft_id=info:doi/10.1109/TMI.2024.3415032&rft_dat=%3Cproquest_RIE%3E3070837230%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3070837230&rft_id=info:pmid/38900619&rft_ieee_id=10566053&rfr_iscdi=true