MedSyn: Text-Guided Anatomy-Aware Synthesis of High-Fidelity 3-D CT Images
This paper introduces an innovative methodology for producing high-quality 3D lung CT images guided by textual information. While diffusion-based generative models are increasingly used in medical imaging, current state-of-the-art approaches are limited to low-resolution outputs and underutilize rad...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on medical imaging 2024-10, Vol.43 (10), p.3648-3660 |
---|---|
Hauptverfasser: | , , , , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 3660 |
---|---|
container_issue | 10 |
container_start_page | 3648 |
container_title | IEEE transactions on medical imaging |
container_volume | 43 |
creator | Xu, Yanwu Sun, Li Peng, Wei Jia, Shuyue Morrison, Katelyn Perer, Adam Zandifar, Afrooz Visweswaran, Shyam Eslami, Motahhare Batmanghelich, Kayhan |
description | This paper introduces an innovative methodology for producing high-quality 3D lung CT images guided by textual information. While diffusion-based generative models are increasingly used in medical imaging, current state-of-the-art approaches are limited to low-resolution outputs and underutilize radiology reports' abundant information. The radiology reports can enhance the generation process by providing additional guidance and offering fine-grained control over the synthesis of images. Nevertheless, expanding text-guided generation to high-resolution 3D images poses significant memory and anatomical detail-preserving challenges. Addressing the memory issue, we introduce a hierarchical scheme that uses a modified UNet architecture. We start by synthesizing low-resolution images conditioned on the text, serving as a foundation for subsequent generators for complete volumetric data. To ensure the anatomical plausibility of the generated samples, we provide further guidance by generating vascular, airway, and lobular segmentation masks in conjunction with the CT images. The model demonstrates the capability to use textual input and segmentation tasks to generate synthesized images. Algorithmic comparative assessments and blind evaluations conducted by 10 board-certified radiologists indicate that our approach exhibits superior performance compared to the most advanced models based on GAN and diffusion techniques, especially in accurately retaining crucial anatomical features such as fissure lines and airways. This innovation introduces novel possibilities. This study focuses on two main objectives: (1) the development of a method for creating images based on textual prompts and anatomical components, and (2) the capability to generate new images conditioning on anatomical elements. The advancements in image generation can be applied to enhance numerous downstream tasks. |
doi_str_mv | 10.1109/TMI.2024.3415032 |
format | Article |
fullrecord | <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_ieee_primary_10566053</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10566053</ieee_id><sourcerecordid>3070837230</sourcerecordid><originalsourceid>FETCH-LOGICAL-c203t-12d6cf65543e00fe7232c202791029faf339876ff1a6e05e78cc432f32694e3f3</originalsourceid><addsrcrecordid>eNpNkL9PwzAQhS0EoqWwMyCUkcXl7IuTmK0q9AdqxUCQ2KKQnNugpoE4EeS_x1ULYjrp3vfe8DF2KWAoBOjbeDkfSpD-EH2hAOUR6wulIi6V_3rM-iDDiAMEssfOrH0HEL4Cfcp6GGn3FrrPHpeUP3fbOy-m74ZP2yKn3Btt06YqOz76SmvyXNysyRbWq4w3K1ZrPnHUpmg6D_m9N469eZmuyJ6zE5NuLF0c7oC9TB7i8Ywvnqbz8WjBMwnYcCHzIDOBUj4SgKFQonSJDLUAqU1qEHUUBsaINCBQFEZZ5qM0KAPtExocsJv97kddfbZkm6QsbEabTbqlqrUJQggRullwKOzRrK6srckkH3VRpnWXCEh2BhNnMNkZTA4GXeX6sN6-lZT_FX6VOeBqDxRE9G9PBQEoxB9h-nGx</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3070837230</pqid></control><display><type>article</type><title>MedSyn: Text-Guided Anatomy-Aware Synthesis of High-Fidelity 3-D CT Images</title><source>IEEE Electronic Library (IEL)</source><creator>Xu, Yanwu ; Sun, Li ; Peng, Wei ; Jia, Shuyue ; Morrison, Katelyn ; Perer, Adam ; Zandifar, Afrooz ; Visweswaran, Shyam ; Eslami, Motahhare ; Batmanghelich, Kayhan</creator><creatorcontrib>Xu, Yanwu ; Sun, Li ; Peng, Wei ; Jia, Shuyue ; Morrison, Katelyn ; Perer, Adam ; Zandifar, Afrooz ; Visweswaran, Shyam ; Eslami, Motahhare ; Batmanghelich, Kayhan</creatorcontrib><description>This paper introduces an innovative methodology for producing high-quality 3D lung CT images guided by textual information. While diffusion-based generative models are increasingly used in medical imaging, current state-of-the-art approaches are limited to low-resolution outputs and underutilize radiology reports' abundant information. The radiology reports can enhance the generation process by providing additional guidance and offering fine-grained control over the synthesis of images. Nevertheless, expanding text-guided generation to high-resolution 3D images poses significant memory and anatomical detail-preserving challenges. Addressing the memory issue, we introduce a hierarchical scheme that uses a modified UNet architecture. We start by synthesizing low-resolution images conditioned on the text, serving as a foundation for subsequent generators for complete volumetric data. To ensure the anatomical plausibility of the generated samples, we provide further guidance by generating vascular, airway, and lobular segmentation masks in conjunction with the CT images. The model demonstrates the capability to use textual input and segmentation tasks to generate synthesized images. Algorithmic comparative assessments and blind evaluations conducted by 10 board-certified radiologists indicate that our approach exhibits superior performance compared to the most advanced models based on GAN and diffusion techniques, especially in accurately retaining crucial anatomical features such as fissure lines and airways. This innovation introduces novel possibilities. This study focuses on two main objectives: (1) the development of a method for creating images based on textual prompts and anatomical components, and (2) the capability to generate new images conditioning on anatomical elements. The advancements in image generation can be applied to enhance numerous downstream tasks.</description><identifier>ISSN: 0278-0062</identifier><identifier>ISSN: 1558-254X</identifier><identifier>EISSN: 1558-254X</identifier><identifier>DOI: 10.1109/TMI.2024.3415032</identifier><identifier>PMID: 38900619</identifier><identifier>CODEN: ITMID4</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>3D image generation ; Algorithms ; Atmospheric modeling ; Biomedical imaging ; Computed tomography ; controllable synthesis ; Diffusion model ; Humans ; Image synthesis ; Imaging, Three-Dimensional - methods ; Lung ; Lung - diagnostic imaging ; lung CT ; Radiology ; text-guided image generation ; Three-dimensional displays ; Tomography, X-Ray Computed - methods ; volume synthesis with radiology report</subject><ispartof>IEEE transactions on medical imaging, 2024-10, Vol.43 (10), p.3648-3660</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c203t-12d6cf65543e00fe7232c202791029faf339876ff1a6e05e78cc432f32694e3f3</cites><orcidid>0000-0001-9893-9136 ; 0000-0001-5676-012X ; 0000-0002-8369-3847 ; 0000-0002-2079-8684 ; 0000-0002-5809-1318 ; 0000-0002-1499-3045 ; 0000-0002-2892-5764</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10566053$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,777,781,793,27905,27906,54739</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10566053$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/38900619$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Xu, Yanwu</creatorcontrib><creatorcontrib>Sun, Li</creatorcontrib><creatorcontrib>Peng, Wei</creatorcontrib><creatorcontrib>Jia, Shuyue</creatorcontrib><creatorcontrib>Morrison, Katelyn</creatorcontrib><creatorcontrib>Perer, Adam</creatorcontrib><creatorcontrib>Zandifar, Afrooz</creatorcontrib><creatorcontrib>Visweswaran, Shyam</creatorcontrib><creatorcontrib>Eslami, Motahhare</creatorcontrib><creatorcontrib>Batmanghelich, Kayhan</creatorcontrib><title>MedSyn: Text-Guided Anatomy-Aware Synthesis of High-Fidelity 3-D CT Images</title><title>IEEE transactions on medical imaging</title><addtitle>TMI</addtitle><addtitle>IEEE Trans Med Imaging</addtitle><description>This paper introduces an innovative methodology for producing high-quality 3D lung CT images guided by textual information. While diffusion-based generative models are increasingly used in medical imaging, current state-of-the-art approaches are limited to low-resolution outputs and underutilize radiology reports' abundant information. The radiology reports can enhance the generation process by providing additional guidance and offering fine-grained control over the synthesis of images. Nevertheless, expanding text-guided generation to high-resolution 3D images poses significant memory and anatomical detail-preserving challenges. Addressing the memory issue, we introduce a hierarchical scheme that uses a modified UNet architecture. We start by synthesizing low-resolution images conditioned on the text, serving as a foundation for subsequent generators for complete volumetric data. To ensure the anatomical plausibility of the generated samples, we provide further guidance by generating vascular, airway, and lobular segmentation masks in conjunction with the CT images. The model demonstrates the capability to use textual input and segmentation tasks to generate synthesized images. Algorithmic comparative assessments and blind evaluations conducted by 10 board-certified radiologists indicate that our approach exhibits superior performance compared to the most advanced models based on GAN and diffusion techniques, especially in accurately retaining crucial anatomical features such as fissure lines and airways. This innovation introduces novel possibilities. This study focuses on two main objectives: (1) the development of a method for creating images based on textual prompts and anatomical components, and (2) the capability to generate new images conditioning on anatomical elements. The advancements in image generation can be applied to enhance numerous downstream tasks.</description><subject>3D image generation</subject><subject>Algorithms</subject><subject>Atmospheric modeling</subject><subject>Biomedical imaging</subject><subject>Computed tomography</subject><subject>controllable synthesis</subject><subject>Diffusion model</subject><subject>Humans</subject><subject>Image synthesis</subject><subject>Imaging, Three-Dimensional - methods</subject><subject>Lung</subject><subject>Lung - diagnostic imaging</subject><subject>lung CT</subject><subject>Radiology</subject><subject>text-guided image generation</subject><subject>Three-dimensional displays</subject><subject>Tomography, X-Ray Computed - methods</subject><subject>volume synthesis with radiology report</subject><issn>0278-0062</issn><issn>1558-254X</issn><issn>1558-254X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><sourceid>EIF</sourceid><recordid>eNpNkL9PwzAQhS0EoqWwMyCUkcXl7IuTmK0q9AdqxUCQ2KKQnNugpoE4EeS_x1ULYjrp3vfe8DF2KWAoBOjbeDkfSpD-EH2hAOUR6wulIi6V_3rM-iDDiAMEssfOrH0HEL4Cfcp6GGn3FrrPHpeUP3fbOy-m74ZP2yKn3Btt06YqOz76SmvyXNysyRbWq4w3K1ZrPnHUpmg6D_m9N469eZmuyJ6zE5NuLF0c7oC9TB7i8Ywvnqbz8WjBMwnYcCHzIDOBUj4SgKFQonSJDLUAqU1qEHUUBsaINCBQFEZZ5qM0KAPtExocsJv97kddfbZkm6QsbEabTbqlqrUJQggRullwKOzRrK6srckkH3VRpnWXCEh2BhNnMNkZTA4GXeX6sN6-lZT_FX6VOeBqDxRE9G9PBQEoxB9h-nGx</recordid><startdate>202410</startdate><enddate>202410</enddate><creator>Xu, Yanwu</creator><creator>Sun, Li</creator><creator>Peng, Wei</creator><creator>Jia, Shuyue</creator><creator>Morrison, Katelyn</creator><creator>Perer, Adam</creator><creator>Zandifar, Afrooz</creator><creator>Visweswaran, Shyam</creator><creator>Eslami, Motahhare</creator><creator>Batmanghelich, Kayhan</creator><general>IEEE</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0001-9893-9136</orcidid><orcidid>https://orcid.org/0000-0001-5676-012X</orcidid><orcidid>https://orcid.org/0000-0002-8369-3847</orcidid><orcidid>https://orcid.org/0000-0002-2079-8684</orcidid><orcidid>https://orcid.org/0000-0002-5809-1318</orcidid><orcidid>https://orcid.org/0000-0002-1499-3045</orcidid><orcidid>https://orcid.org/0000-0002-2892-5764</orcidid></search><sort><creationdate>202410</creationdate><title>MedSyn: Text-Guided Anatomy-Aware Synthesis of High-Fidelity 3-D CT Images</title><author>Xu, Yanwu ; Sun, Li ; Peng, Wei ; Jia, Shuyue ; Morrison, Katelyn ; Perer, Adam ; Zandifar, Afrooz ; Visweswaran, Shyam ; Eslami, Motahhare ; Batmanghelich, Kayhan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c203t-12d6cf65543e00fe7232c202791029faf339876ff1a6e05e78cc432f32694e3f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>3D image generation</topic><topic>Algorithms</topic><topic>Atmospheric modeling</topic><topic>Biomedical imaging</topic><topic>Computed tomography</topic><topic>controllable synthesis</topic><topic>Diffusion model</topic><topic>Humans</topic><topic>Image synthesis</topic><topic>Imaging, Three-Dimensional - methods</topic><topic>Lung</topic><topic>Lung - diagnostic imaging</topic><topic>lung CT</topic><topic>Radiology</topic><topic>text-guided image generation</topic><topic>Three-dimensional displays</topic><topic>Tomography, X-Ray Computed - methods</topic><topic>volume synthesis with radiology report</topic><toplevel>online_resources</toplevel><creatorcontrib>Xu, Yanwu</creatorcontrib><creatorcontrib>Sun, Li</creatorcontrib><creatorcontrib>Peng, Wei</creatorcontrib><creatorcontrib>Jia, Shuyue</creatorcontrib><creatorcontrib>Morrison, Katelyn</creatorcontrib><creatorcontrib>Perer, Adam</creatorcontrib><creatorcontrib>Zandifar, Afrooz</creatorcontrib><creatorcontrib>Visweswaran, Shyam</creatorcontrib><creatorcontrib>Eslami, Motahhare</creatorcontrib><creatorcontrib>Batmanghelich, Kayhan</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE transactions on medical imaging</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Xu, Yanwu</au><au>Sun, Li</au><au>Peng, Wei</au><au>Jia, Shuyue</au><au>Morrison, Katelyn</au><au>Perer, Adam</au><au>Zandifar, Afrooz</au><au>Visweswaran, Shyam</au><au>Eslami, Motahhare</au><au>Batmanghelich, Kayhan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>MedSyn: Text-Guided Anatomy-Aware Synthesis of High-Fidelity 3-D CT Images</atitle><jtitle>IEEE transactions on medical imaging</jtitle><stitle>TMI</stitle><addtitle>IEEE Trans Med Imaging</addtitle><date>2024-10</date><risdate>2024</risdate><volume>43</volume><issue>10</issue><spage>3648</spage><epage>3660</epage><pages>3648-3660</pages><issn>0278-0062</issn><issn>1558-254X</issn><eissn>1558-254X</eissn><coden>ITMID4</coden><abstract>This paper introduces an innovative methodology for producing high-quality 3D lung CT images guided by textual information. While diffusion-based generative models are increasingly used in medical imaging, current state-of-the-art approaches are limited to low-resolution outputs and underutilize radiology reports' abundant information. The radiology reports can enhance the generation process by providing additional guidance and offering fine-grained control over the synthesis of images. Nevertheless, expanding text-guided generation to high-resolution 3D images poses significant memory and anatomical detail-preserving challenges. Addressing the memory issue, we introduce a hierarchical scheme that uses a modified UNet architecture. We start by synthesizing low-resolution images conditioned on the text, serving as a foundation for subsequent generators for complete volumetric data. To ensure the anatomical plausibility of the generated samples, we provide further guidance by generating vascular, airway, and lobular segmentation masks in conjunction with the CT images. The model demonstrates the capability to use textual input and segmentation tasks to generate synthesized images. Algorithmic comparative assessments and blind evaluations conducted by 10 board-certified radiologists indicate that our approach exhibits superior performance compared to the most advanced models based on GAN and diffusion techniques, especially in accurately retaining crucial anatomical features such as fissure lines and airways. This innovation introduces novel possibilities. This study focuses on two main objectives: (1) the development of a method for creating images based on textual prompts and anatomical components, and (2) the capability to generate new images conditioning on anatomical elements. The advancements in image generation can be applied to enhance numerous downstream tasks.</abstract><cop>United States</cop><pub>IEEE</pub><pmid>38900619</pmid><doi>10.1109/TMI.2024.3415032</doi><tpages>13</tpages><orcidid>https://orcid.org/0000-0001-9893-9136</orcidid><orcidid>https://orcid.org/0000-0001-5676-012X</orcidid><orcidid>https://orcid.org/0000-0002-8369-3847</orcidid><orcidid>https://orcid.org/0000-0002-2079-8684</orcidid><orcidid>https://orcid.org/0000-0002-5809-1318</orcidid><orcidid>https://orcid.org/0000-0002-1499-3045</orcidid><orcidid>https://orcid.org/0000-0002-2892-5764</orcidid></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 0278-0062 |
ispartof | IEEE transactions on medical imaging, 2024-10, Vol.43 (10), p.3648-3660 |
issn | 0278-0062 1558-254X 1558-254X |
language | eng |
recordid | cdi_ieee_primary_10566053 |
source | IEEE Electronic Library (IEL) |
subjects | 3D image generation Algorithms Atmospheric modeling Biomedical imaging Computed tomography controllable synthesis Diffusion model Humans Image synthesis Imaging, Three-Dimensional - methods Lung Lung - diagnostic imaging lung CT Radiology text-guided image generation Three-dimensional displays Tomography, X-Ray Computed - methods volume synthesis with radiology report |
title | MedSyn: Text-Guided Anatomy-Aware Synthesis of High-Fidelity 3-D CT Images |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-20T03%3A28%3A12IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=MedSyn:%20Text-Guided%20Anatomy-Aware%20Synthesis%20of%20High-Fidelity%203-D%20CT%20Images&rft.jtitle=IEEE%20transactions%20on%20medical%20imaging&rft.au=Xu,%20Yanwu&rft.date=2024-10&rft.volume=43&rft.issue=10&rft.spage=3648&rft.epage=3660&rft.pages=3648-3660&rft.issn=0278-0062&rft.eissn=1558-254X&rft.coden=ITMID4&rft_id=info:doi/10.1109/TMI.2024.3415032&rft_dat=%3Cproquest_RIE%3E3070837230%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3070837230&rft_id=info:pmid/38900619&rft_ieee_id=10566053&rfr_iscdi=true |