LViT: Language meets Vision Transformer in Medical Image Segmentation

Deep learning has been widely used in medical image segmentation and other aspects. However, the performance of existing medical image segmentation models has been limited by the challenge of obtaining sufficient high-quality labeled data due to the prohibitive data annotation cost. To alleviate thi...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on medical imaging 2024-01, Vol.43 (1), p.1-1
Hauptverfasser: Li, Zihan, Li, Yunxiang, Li, Qingde, Wang, Puyang, Guo, Dazhou, Lu, Le, Jin, Dakai, Zhang, You, Hong, Qingqi
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1
container_issue 1
container_start_page 1
container_title IEEE transactions on medical imaging
container_volume 43
creator Li, Zihan
Li, Yunxiang
Li, Qingde
Wang, Puyang
Guo, Dazhou
Lu, Le
Jin, Dakai
Zhang, You
Hong, Qingqi
description Deep learning has been widely used in medical image segmentation and other aspects. However, the performance of existing medical image segmentation models has been limited by the challenge of obtaining sufficient high-quality labeled data due to the prohibitive data annotation cost. To alleviate this limitation, we propose a new text-augmented medical image segmentation model LViT (Language meets Vision Transformer). In our LViT model, medical text annotation is incorporated to compensate for the quality deficiency in image data. In addition, the text information can guide to generate pseudo labels of improved quality in the semi-supervised learning. We also propose an Exponential Pseudo label Iteration mechanism (EPI) to help the Pixel-Level Attention Module (PLAM) preserve local image features in semi-supervised LViT setting. In our model, LV (Language-Vision) loss is designed to supervise the training of unlabeled images using text information directly. For evaluation, we construct three multimodal medical segmentation datasets (image + text) containing X-rays and CT images. Experimental results show that our proposed LViT has superior segmentation performance in both fully-supervised and semi-supervised setting. The code and datasets are available at https://github.com/HUANGLIZI/LViT.
doi_str_mv 10.1109/TMI.2023.3291719
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_miscellaneous_2833023014</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10172039</ieee_id><sourcerecordid>2909271318</sourcerecordid><originalsourceid>FETCH-LOGICAL-c348t-712c8a61674e82a0e67c1fb8b04ed7bc507823113eff2e52a037296d16bb4fba3</originalsourceid><addsrcrecordid>eNpd0EFLwzAUwPEgipvTuweRghcvnXlJ2yTeZEwddHiwDm8hbV9Hx9pq0h789mZsinjK5fceL39CLoFOAai6y5aLKaOMTzlTIEAdkTHEsQxZHL0fkzFlQoaUJmxEzpzbUApRTNUpGXHBlYJYjMk8XdXZfZCadj2YNQYNYu-CVe3qrg0ya1pXdbZBG9RtsMSyLsw2WDQ7-YrrBtve9F6ek5PKbB1eHN4JeXucZ7PnMH15Wswe0rDgkexDAayQJoFERCiZoZiIAqpc5jTCUuRFTIVkHIBjVTGMveCCqaSEJM-jKjd8Qm73ez9s9zmg63VTuwK3W9NiNzjNJOc-h_-npzf_6KYbbOuv00xRxQRwkF7RvSps55zFSn_YujH2SwPVu8TaJ9a7xPqQ2I9cHxYPeYPl78BPUw-u9qBGxD_7QDDKFf8G0Eh9sQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2909271318</pqid></control><display><type>article</type><title>LViT: Language meets Vision Transformer in Medical Image Segmentation</title><source>IEEE Electronic Library (IEL)</source><creator>Li, Zihan ; Li, Yunxiang ; Li, Qingde ; Wang, Puyang ; Guo, Dazhou ; Lu, Le ; Jin, Dakai ; Zhang, You ; Hong, Qingqi</creator><creatorcontrib>Li, Zihan ; Li, Yunxiang ; Li, Qingde ; Wang, Puyang ; Guo, Dazhou ; Lu, Le ; Jin, Dakai ; Zhang, You ; Hong, Qingqi</creatorcontrib><description>Deep learning has been widely used in medical image segmentation and other aspects. However, the performance of existing medical image segmentation models has been limited by the challenge of obtaining sufficient high-quality labeled data due to the prohibitive data annotation cost. To alleviate this limitation, we propose a new text-augmented medical image segmentation model LViT (Language meets Vision Transformer). In our LViT model, medical text annotation is incorporated to compensate for the quality deficiency in image data. In addition, the text information can guide to generate pseudo labels of improved quality in the semi-supervised learning. We also propose an Exponential Pseudo label Iteration mechanism (EPI) to help the Pixel-Level Attention Module (PLAM) preserve local image features in semi-supervised LViT setting. In our model, LV (Language-Vision) loss is designed to supervise the training of unlabeled images using text information directly. For evaluation, we construct three multimodal medical segmentation datasets (image + text) containing X-rays and CT images. Experimental results show that our proposed LViT has superior segmentation performance in both fully-supervised and semi-supervised setting. The code and datasets are available at https://github.com/HUANGLIZI/LViT.</description><identifier>ISSN: 0278-0062</identifier><identifier>ISSN: 1558-254X</identifier><identifier>EISSN: 1558-254X</identifier><identifier>DOI: 10.1109/TMI.2023.3291719</identifier><identifier>PMID: 37399157</identifier><identifier>CODEN: ITMID4</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>Biomedical imaging ; Computed tomography ; Convolutional neural networks ; Data models ; Datasets ; Deep learning ; Feature extraction ; Image annotation ; Image processing ; Image Processing, Computer-Assisted ; Image quality ; Image segmentation ; Iterative methods ; Labels ; Language ; Medical image segmentation ; Medical imaging ; Semi-supervised learning ; Supervised Machine Learning ; Transformers ; Vision ; Vision-Language ; Visualization</subject><ispartof>IEEE transactions on medical imaging, 2024-01, Vol.43 (1), p.1-1</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c348t-712c8a61674e82a0e67c1fb8b04ed7bc507823113eff2e52a037296d16bb4fba3</citedby><cites>FETCH-LOGICAL-c348t-712c8a61674e82a0e67c1fb8b04ed7bc507823113eff2e52a037296d16bb4fba3</cites><orcidid>0000-0003-2657-6051 ; 0000-0002-9996-6870 ; 0000-0002-8033-2755 ; 0009-0004-3839-0611 ; 0000-0003-0622-4710 ; 0000-0001-5998-7565</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10172039$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10172039$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/37399157$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Li, Zihan</creatorcontrib><creatorcontrib>Li, Yunxiang</creatorcontrib><creatorcontrib>Li, Qingde</creatorcontrib><creatorcontrib>Wang, Puyang</creatorcontrib><creatorcontrib>Guo, Dazhou</creatorcontrib><creatorcontrib>Lu, Le</creatorcontrib><creatorcontrib>Jin, Dakai</creatorcontrib><creatorcontrib>Zhang, You</creatorcontrib><creatorcontrib>Hong, Qingqi</creatorcontrib><title>LViT: Language meets Vision Transformer in Medical Image Segmentation</title><title>IEEE transactions on medical imaging</title><addtitle>TMI</addtitle><addtitle>IEEE Trans Med Imaging</addtitle><description>Deep learning has been widely used in medical image segmentation and other aspects. However, the performance of existing medical image segmentation models has been limited by the challenge of obtaining sufficient high-quality labeled data due to the prohibitive data annotation cost. To alleviate this limitation, we propose a new text-augmented medical image segmentation model LViT (Language meets Vision Transformer). In our LViT model, medical text annotation is incorporated to compensate for the quality deficiency in image data. In addition, the text information can guide to generate pseudo labels of improved quality in the semi-supervised learning. We also propose an Exponential Pseudo label Iteration mechanism (EPI) to help the Pixel-Level Attention Module (PLAM) preserve local image features in semi-supervised LViT setting. In our model, LV (Language-Vision) loss is designed to supervise the training of unlabeled images using text information directly. For evaluation, we construct three multimodal medical segmentation datasets (image + text) containing X-rays and CT images. Experimental results show that our proposed LViT has superior segmentation performance in both fully-supervised and semi-supervised setting. The code and datasets are available at https://github.com/HUANGLIZI/LViT.</description><subject>Biomedical imaging</subject><subject>Computed tomography</subject><subject>Convolutional neural networks</subject><subject>Data models</subject><subject>Datasets</subject><subject>Deep learning</subject><subject>Feature extraction</subject><subject>Image annotation</subject><subject>Image processing</subject><subject>Image Processing, Computer-Assisted</subject><subject>Image quality</subject><subject>Image segmentation</subject><subject>Iterative methods</subject><subject>Labels</subject><subject>Language</subject><subject>Medical image segmentation</subject><subject>Medical imaging</subject><subject>Semi-supervised learning</subject><subject>Supervised Machine Learning</subject><subject>Transformers</subject><subject>Vision</subject><subject>Vision-Language</subject><subject>Visualization</subject><issn>0278-0062</issn><issn>1558-254X</issn><issn>1558-254X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><sourceid>EIF</sourceid><recordid>eNpd0EFLwzAUwPEgipvTuweRghcvnXlJ2yTeZEwddHiwDm8hbV9Hx9pq0h789mZsinjK5fceL39CLoFOAai6y5aLKaOMTzlTIEAdkTHEsQxZHL0fkzFlQoaUJmxEzpzbUApRTNUpGXHBlYJYjMk8XdXZfZCadj2YNQYNYu-CVe3qrg0ya1pXdbZBG9RtsMSyLsw2WDQ7-YrrBtve9F6ek5PKbB1eHN4JeXucZ7PnMH15Wswe0rDgkexDAayQJoFERCiZoZiIAqpc5jTCUuRFTIVkHIBjVTGMveCCqaSEJM-jKjd8Qm73ez9s9zmg63VTuwK3W9NiNzjNJOc-h_-npzf_6KYbbOuv00xRxQRwkF7RvSps55zFSn_YujH2SwPVu8TaJ9a7xPqQ2I9cHxYPeYPl78BPUw-u9qBGxD_7QDDKFf8G0Eh9sQ</recordid><startdate>20240101</startdate><enddate>20240101</enddate><creator>Li, Zihan</creator><creator>Li, Yunxiang</creator><creator>Li, Qingde</creator><creator>Wang, Puyang</creator><creator>Guo, Dazhou</creator><creator>Lu, Le</creator><creator>Jin, Dakai</creator><creator>Zhang, You</creator><creator>Hong, Qingqi</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QF</scope><scope>7QO</scope><scope>7QQ</scope><scope>7SC</scope><scope>7SE</scope><scope>7SP</scope><scope>7SR</scope><scope>7TA</scope><scope>7TB</scope><scope>7U5</scope><scope>8BQ</scope><scope>8FD</scope><scope>F28</scope><scope>FR3</scope><scope>H8D</scope><scope>JG9</scope><scope>JQ2</scope><scope>KR7</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>NAPCQ</scope><scope>P64</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0003-2657-6051</orcidid><orcidid>https://orcid.org/0000-0002-9996-6870</orcidid><orcidid>https://orcid.org/0000-0002-8033-2755</orcidid><orcidid>https://orcid.org/0009-0004-3839-0611</orcidid><orcidid>https://orcid.org/0000-0003-0622-4710</orcidid><orcidid>https://orcid.org/0000-0001-5998-7565</orcidid></search><sort><creationdate>20240101</creationdate><title>LViT: Language meets Vision Transformer in Medical Image Segmentation</title><author>Li, Zihan ; Li, Yunxiang ; Li, Qingde ; Wang, Puyang ; Guo, Dazhou ; Lu, Le ; Jin, Dakai ; Zhang, You ; Hong, Qingqi</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c348t-712c8a61674e82a0e67c1fb8b04ed7bc507823113eff2e52a037296d16bb4fba3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Biomedical imaging</topic><topic>Computed tomography</topic><topic>Convolutional neural networks</topic><topic>Data models</topic><topic>Datasets</topic><topic>Deep learning</topic><topic>Feature extraction</topic><topic>Image annotation</topic><topic>Image processing</topic><topic>Image Processing, Computer-Assisted</topic><topic>Image quality</topic><topic>Image segmentation</topic><topic>Iterative methods</topic><topic>Labels</topic><topic>Language</topic><topic>Medical image segmentation</topic><topic>Medical imaging</topic><topic>Semi-supervised learning</topic><topic>Supervised Machine Learning</topic><topic>Transformers</topic><topic>Vision</topic><topic>Vision-Language</topic><topic>Visualization</topic><toplevel>online_resources</toplevel><creatorcontrib>Li, Zihan</creatorcontrib><creatorcontrib>Li, Yunxiang</creatorcontrib><creatorcontrib>Li, Qingde</creatorcontrib><creatorcontrib>Wang, Puyang</creatorcontrib><creatorcontrib>Guo, Dazhou</creatorcontrib><creatorcontrib>Lu, Le</creatorcontrib><creatorcontrib>Jin, Dakai</creatorcontrib><creatorcontrib>Zhang, You</creatorcontrib><creatorcontrib>Hong, Qingqi</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Aluminium Industry Abstracts</collection><collection>Biotechnology Research Abstracts</collection><collection>Ceramic Abstracts</collection><collection>Computer and Information Systems Abstracts</collection><collection>Corrosion Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>Materials Business File</collection><collection>Mechanical &amp; Transportation Engineering Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>ANTE: Abstracts in New Technology &amp; Engineering</collection><collection>Engineering Research Database</collection><collection>Aerospace Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Civil Engineering Abstracts</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Nursing &amp; Allied Health Premium</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE transactions on medical imaging</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Li, Zihan</au><au>Li, Yunxiang</au><au>Li, Qingde</au><au>Wang, Puyang</au><au>Guo, Dazhou</au><au>Lu, Le</au><au>Jin, Dakai</au><au>Zhang, You</au><au>Hong, Qingqi</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>LViT: Language meets Vision Transformer in Medical Image Segmentation</atitle><jtitle>IEEE transactions on medical imaging</jtitle><stitle>TMI</stitle><addtitle>IEEE Trans Med Imaging</addtitle><date>2024-01-01</date><risdate>2024</risdate><volume>43</volume><issue>1</issue><spage>1</spage><epage>1</epage><pages>1-1</pages><issn>0278-0062</issn><issn>1558-254X</issn><eissn>1558-254X</eissn><coden>ITMID4</coden><abstract>Deep learning has been widely used in medical image segmentation and other aspects. However, the performance of existing medical image segmentation models has been limited by the challenge of obtaining sufficient high-quality labeled data due to the prohibitive data annotation cost. To alleviate this limitation, we propose a new text-augmented medical image segmentation model LViT (Language meets Vision Transformer). In our LViT model, medical text annotation is incorporated to compensate for the quality deficiency in image data. In addition, the text information can guide to generate pseudo labels of improved quality in the semi-supervised learning. We also propose an Exponential Pseudo label Iteration mechanism (EPI) to help the Pixel-Level Attention Module (PLAM) preserve local image features in semi-supervised LViT setting. In our model, LV (Language-Vision) loss is designed to supervise the training of unlabeled images using text information directly. For evaluation, we construct three multimodal medical segmentation datasets (image + text) containing X-rays and CT images. Experimental results show that our proposed LViT has superior segmentation performance in both fully-supervised and semi-supervised setting. The code and datasets are available at https://github.com/HUANGLIZI/LViT.</abstract><cop>United States</cop><pub>IEEE</pub><pmid>37399157</pmid><doi>10.1109/TMI.2023.3291719</doi><tpages>1</tpages><orcidid>https://orcid.org/0000-0003-2657-6051</orcidid><orcidid>https://orcid.org/0000-0002-9996-6870</orcidid><orcidid>https://orcid.org/0000-0002-8033-2755</orcidid><orcidid>https://orcid.org/0009-0004-3839-0611</orcidid><orcidid>https://orcid.org/0000-0003-0622-4710</orcidid><orcidid>https://orcid.org/0000-0001-5998-7565</orcidid></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 0278-0062
ispartof IEEE transactions on medical imaging, 2024-01, Vol.43 (1), p.1-1
issn 0278-0062
1558-254X
1558-254X
language eng
recordid cdi_proquest_miscellaneous_2833023014
source IEEE Electronic Library (IEL)
subjects Biomedical imaging
Computed tomography
Convolutional neural networks
Data models
Datasets
Deep learning
Feature extraction
Image annotation
Image processing
Image Processing, Computer-Assisted
Image quality
Image segmentation
Iterative methods
Labels
Language
Medical image segmentation
Medical imaging
Semi-supervised learning
Supervised Machine Learning
Transformers
Vision
Vision-Language
Visualization
title LViT: Language meets Vision Transformer in Medical Image Segmentation
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-26T12%3A41%3A30IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=LViT:%20Language%20meets%20Vision%20Transformer%20in%20Medical%20Image%20Segmentation&rft.jtitle=IEEE%20transactions%20on%20medical%20imaging&rft.au=Li,%20Zihan&rft.date=2024-01-01&rft.volume=43&rft.issue=1&rft.spage=1&rft.epage=1&rft.pages=1-1&rft.issn=0278-0062&rft.eissn=1558-254X&rft.coden=ITMID4&rft_id=info:doi/10.1109/TMI.2023.3291719&rft_dat=%3Cproquest_RIE%3E2909271318%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2909271318&rft_id=info:pmid/37399157&rft_ieee_id=10172039&rfr_iscdi=true