LViT: Language meets Vision Transformer in Medical Image Segmentation

Deep learning has been widely used in medical image segmentation and other aspects. However, the performance of existing medical image segmentation models has been limited by the challenge of obtaining sufficient high-quality labeled data due to the prohibitive data annotation cost. To alleviate thi...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on medical imaging 2024-01, Vol.43 (1), p.1-1
Hauptverfasser:	Li, Zihan, Li, Yunxiang, Li, Qingde, Wang, Puyang, Guo, Dazhou, Lu, Le, Jin, Dakai, Zhang, You, Hong, Qingqi
Format:	Artikel
Sprache:	eng
Schlagworte:	Biomedical imaging Computed tomography Convolutional neural networks Data models Datasets Deep learning Feature extraction Image annotation Image processing Image Processing, Computer-Assisted Image quality Image segmentation Iterative methods Labels Language Medical image segmentation Medical imaging Semi-supervised learning Supervised Machine Learning Transformers Vision Vision-Language Visualization
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	1
container_issue	1
container_start_page	1
container_title	IEEE transactions on medical imaging
container_volume	43
creator	Li, Zihan Li, Yunxiang Li, Qingde Wang, Puyang Guo, Dazhou Lu, Le Jin, Dakai Zhang, You Hong, Qingqi
description	Deep learning has been widely used in medical image segmentation and other aspects. However, the performance of existing medical image segmentation models has been limited by the challenge of obtaining sufficient high-quality labeled data due to the prohibitive data annotation cost. To alleviate this limitation, we propose a new text-augmented medical image segmentation model LViT (Language meets Vision Transformer). In our LViT model, medical text annotation is incorporated to compensate for the quality deficiency in image data. In addition, the text information can guide to generate pseudo labels of improved quality in the semi-supervised learning. We also propose an Exponential Pseudo label Iteration mechanism (EPI) to help the Pixel-Level Attention Module (PLAM) preserve local image features in semi-supervised LViT setting. In our model, LV (Language-Vision) loss is designed to supervise the training of unlabeled images using text information directly. For evaluation, we construct three multimodal medical segmentation datasets (image + text) containing X-rays and CT images. Experimental results show that our proposed LViT has superior segmentation performance in both fully-supervised and semi-supervised setting. The code and datasets are available at https://github.com/HUANGLIZI/LViT.
doi_str_mv	10.1109/TMI.2023.3291719
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_miscellaneous_2833023014</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10172039</ieee_id><sourcerecordid>2909271318</sourcerecordid><originalsourceid>FETCH-LOGICAL-c348t-712c8a61674e82a0e67c1fb8b04ed7bc507823113eff2e52a037296d16bb4fba3</originalsourceid><addsrcrecordid>eNpd0EFLwzAUwPEgipvTuweRghcvnXlJ2yTeZEwddHiwDm8hbV9Hx9pq0h789mZsinjK5fceL39CLoFOAai6y5aLKaOMTzlTIEAdkTHEsQxZHL0fkzFlQoaUJmxEzpzbUApRTNUpGXHBlYJYjMk8XdXZfZCadj2YNQYNYu-CVe3qrg0ya1pXdbZBG9RtsMSyLsw2WDQ7-YrrBtve9F6ek5PKbB1eHN4JeXucZ7PnMH15Wswe0rDgkexDAayQJoFERCiZoZiIAqpc5jTCUuRFTIVkHIBjVTGMveCCqaSEJM-jKjd8Qm73ez9s9zmg63VTuwK3W9NiNzjNJOc-h_-npzf_6KYbbOuv00xRxQRwkF7RvSps55zFSn_YujH2SwPVu8TaJ9a7xPqQ2I9cHxYPeYPl78BPUw-u9qBGxD_7QDDKFf8G0Eh9sQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2909271318</pqid></control><display><type>article</type><title>LViT: Language meets Vision Transformer in Medical Image Segmentation</title><source>IEEE Electronic Library (IEL)</source><creator>Li, Zihan ; Li, Yunxiang ; Li, Qingde ; Wang, Puyang ; Guo, Dazhou ; Lu, Le ; Jin, Dakai ; Zhang, You ; Hong, Qingqi</creator><creatorcontrib>Li, Zihan ; Li, Yunxiang ; Li, Qingde ; Wang, Puyang ; Guo, Dazhou ; Lu, Le ; Jin, Dakai ; Zhang, You ; Hong, Qingqi</creatorcontrib><description>Deep learning has been widely used in medical image segmentation and other aspects. However, the performance of existing medical image segmentation models has been limited by the challenge of obtaining sufficient high-quality labeled data due to the prohibitive data annotation cost. To alleviate this limitation, we propose a new text-augmented medical image segmentation model LViT (Language meets Vision Transformer). In our LViT model, medical text annotation is incorporated to compensate for the quality deficiency in image data. In addition, the text information can guide to generate pseudo labels of improved quality in the semi-supervised learning. We also propose an Exponential Pseudo label Iteration mechanism (EPI) to help the Pixel-Level Attention Module (PLAM) preserve local image features in semi-supervised LViT setting. In our model, LV (Language-Vision) loss is designed to supervise the training of unlabeled images using text information directly. For evaluation, we construct three multimodal medical segmentation datasets (image + text) containing X-rays and CT images. Experimental results show that our proposed LViT has superior segmentation performance in both fully-supervised and semi-supervised setting. The code and datasets are available at https://github.com/HUANGLIZI/LViT.</description><identifier>ISSN: 0278-0062</identifier><identifier>ISSN: 1558-254X</identifier><identifier>EISSN: 1558-254X</identifier><identifier>DOI: 10.1109/TMI.2023.3291719</identifier><identifier>PMID: 37399157</identifier><identifier>CODEN: ITMID4</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>Biomedical imaging ; Computed tomography ; Convolutional neural networks ; Data models ; Datasets ; Deep learning ; Feature extraction ; Image annotation ; Image processing ; Image Processing, Computer-Assisted ; Image quality ; Image segmentation ; Iterative methods ; Labels ; Language ; Medical image segmentation ; Medical imaging ; Semi-supervised learning ; Supervised Machine Learning ; Transformers ; Vision ; Vision-Language ; Visualization</subject><ispartof>IEEE transactions on medical imaging, 2024-01, Vol.43 (1), p.1-1</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c348t-712c8a61674e82a0e67c1fb8b04ed7bc507823113eff2e52a037296d16bb4fba3</citedby><cites>FETCH-LOGICAL-c348t-712c8a61674e82a0e67c1fb8b04ed7bc507823113eff2e52a037296d16bb4fba3</cites><orcidid>0000-0003-2657-6051 ; 0000-0002-9996-6870 ; 0000-0002-8033-2755 ; 0009-0004-3839-0611 ; 0000-0003-0622-4710 ; 0000-0001-5998-7565</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10172039$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10172039$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/37399157$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Li, Zihan</creatorcontrib><creatorcontrib>Li, Yunxiang</creatorcontrib><creatorcontrib>Li, Qingde</creatorcontrib><creatorcontrib>Wang, Puyang</creatorcontrib><creatorcontrib>Guo, Dazhou</creatorcontrib><creatorcontrib>Lu, Le</creatorcontrib><creatorcontrib>Jin, Dakai</creatorcontrib><creatorcontrib>Zhang, You</creatorcontrib><creatorcontrib>Hong, Qingqi</creatorcontrib><title>LViT: Language meets Vision Transformer in Medical Image Segmentation</title><title>IEEE transactions on medical imaging</title><addtitle>TMI</addtitle><addtitle>IEEE Trans Med Imaging</addtitle><description>Deep learning has been widely used in medical image segmentation and other aspects. However, the performance of existing medical image segmentation models has been limited by the challenge of obtaining sufficient high-quality labeled data due to the prohibitive data annotation cost. To alleviate this limitation, we propose a new text-augmented medical image segmentation model LViT (Language meets Vision Transformer). In our LViT model, medical text annotation is incorporated to compensate for the quality deficiency in image data. In addition, the text information can guide to generate pseudo labels of improved quality in the semi-supervised learning. We also propose an Exponential Pseudo label Iteration mechanism (EPI) to help the Pixel-Level Attention Module (PLAM) preserve local image features in semi-supervised LViT setting. In our model, LV (Language-Vision) loss is designed to supervise the training of unlabeled images using text information directly. For evaluation, we construct three multimodal medical segmentation datasets (image + text) containing X-rays and CT images. Experimental results show that our proposed LViT has superior segmentation performance in both fully-supervised and semi-supervised setting. The code and datasets are available at https://github.com/HUANGLIZI/LViT.</description><subject>Biomedical imaging</subject><subject>Computed tomography</subject><subject>Convolutional neural networks</subject><subject>Data models</subject><subject>Datasets</subject><subject>Deep learning</subject><subject>Feature extraction</subject><subject>Image annotation</subject><subject>Image processing</subject><subject>Image Processing, Computer-Assisted</subject><subject>Image quality</subject><subject>Image segmentation</subject><subject>Iterative methods</subject><subject>Labels</subject><subject>Language</subject><subject>Medical image segmentation</subject><subject>Medical imaging</subject><subject>Semi-supervised learning</subject><subject>Supervised Machine Learning</subject><subject>Transformers</subject><subject>Vision</subject><subject>Vision-Language</subject><subject>Visualization</subject><issn>0278-0062</issn><issn>1558-254X</issn><issn>1558-254X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><sourceid>EIF</sourceid><recordid>eNpd0EFLwzAUwPEgipvTuweRghcvnXlJ2yTeZEwddHiwDm8hbV9Hx9pq0h789mZsinjK5fceL39CLoFOAai6y5aLKaOMTzlTIEAdkTHEsQxZHL0fkzFlQoaUJmxEzpzbUApRTNUpGXHBlYJYjMk8XdXZfZCadj2YNQYNYu-CVe3qrg0ya1pXdbZBG9RtsMSyLsw2WDQ7-YrrBtve9F6ek5PKbB1eHN4JeXucZ7PnMH15Wswe0rDgkexDAayQJoFERCiZoZiIAqpc5jTCUuRFTIVkHIBjVTGMveCCqaSEJM-jKjd8Qm73ez9s9zmg63VTuwK3W9NiNzjNJOc-h_-npzf_6KYbbOuv00xRxQRwkF7RvSps55zFSn_YujH2SwPVu8TaJ9a7xPqQ2I9cHxYPeYPl78BPUw-u9qBGxD_7QDDKFf8G0Eh9sQ</recordid><startdate>20240101</startdate><enddate>20240101</enddate><creator>Li, Zihan</creator><creator>Li, Yunxiang</creator><creator>Li, Qingde</creator><creator>Wang, Puyang</creator><creator>Guo, Dazhou</creator><creator>Lu, Le</creator><creator>Jin, Dakai</creator><creator>Zhang, You</creator><creator>Hong, Qingqi</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QF</scope><scope>7QO</scope><scope>7QQ</scope><scope>7SC</scope><scope>7SE</scope><scope>7SP</scope><scope>7SR</scope><scope>7TA</scope><scope>7TB</scope><scope>7U5</scope><scope>8BQ</scope><scope>8FD</scope><scope>F28</scope><scope>FR3</scope><scope>H8D</scope><scope>JG9</scope><scope>JQ2</scope><scope>KR7</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>NAPCQ</scope><scope>P64</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0003-2657-6051</orcidid><orcidid>https://orcid.org/0000-0002-9996-6870</orcidid><orcidid>https://orcid.org/0000-0002-8033-2755</orcidid><orcidid>https://orcid.org/0009-0004-3839-0611</orcidid><orcidid>https://orcid.org/0000-0003-0622-4710</orcidid><orcidid>https://orcid.org/0000-0001-5998-7565</orcidid></search><sort><creationdate>20240101</creationdate><title>LViT: Language meets Vision Transformer in Medical Image Segmentation</title><author>Li, Zihan ; Li, Yunxiang ; Li, Qingde ; Wang, Puyang ; Guo, Dazhou ; Lu, Le ; Jin, Dakai ; Zhang, You ; Hong, Qingqi</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c348t-712c8a61674e82a0e67c1fb8b04ed7bc507823113eff2e52a037296d16bb4fba3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Biomedical imaging</topic><topic>Computed tomography</topic><topic>Convolutional neural networks</topic><topic>Data models</topic><topic>Datasets</topic><topic>Deep learning</topic><topic>Feature extraction</topic><topic>Image annotation</topic><topic>Image processing</topic><topic>Image Processing, Computer-Assisted</topic><topic>Image quality</topic><topic>Image segmentation</topic><topic>Iterative methods</topic><topic>Labels</topic><topic>Language</topic><topic>Medical image segmentation</topic><topic>Medical imaging</topic><topic>Semi-supervised learning</topic><topic>Supervised Machine Learning</topic><topic>Transformers</topic><topic>Vision</topic><topic>Vision-Language</topic><topic>Visualization</topic><toplevel>online_resources</toplevel><creatorcontrib>Li, Zihan</creatorcontrib><creatorcontrib>Li, Yunxiang</creatorcontrib><creatorcontrib>Li, Qingde</creatorcontrib><creatorcontrib>Wang, Puyang</creatorcontrib><creatorcontrib>Guo, Dazhou</creatorcontrib><creatorcontrib>Lu, Le</creatorcontrib><creatorcontrib>Jin, Dakai</creatorcontrib><creatorcontrib>Zhang, You</creatorcontrib><creatorcontrib>Hong, Qingqi</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Aluminium Industry Abstracts</collection><collection>Biotechnology Research Abstracts</collection><collection>Ceramic Abstracts</collection><collection>Computer and Information Systems Abstracts</collection><collection>Corrosion Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>Materials Business File</collection><collection>Mechanical & Transportation Engineering Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>ANTE: Abstracts in New Technology & Engineering</collection><collection>Engineering Research Database</collection><collection>Aerospace Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Civil Engineering Abstracts</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Nursing & Allied Health Premium</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE transactions on medical imaging</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Li, Zihan</au><au>Li, Yunxiang</au><au>Li, Qingde</au><au>Wang, Puyang</au><au>Guo, Dazhou</au><au>Lu, Le</au><au>Jin, Dakai</au><au>Zhang, You</au><au>Hong, Qingqi</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>LViT: Language meets Vision Transformer in Medical Image Segmentation</atitle><jtitle>IEEE transactions on medical imaging</jtitle><stitle>TMI</stitle><addtitle>IEEE Trans Med Imaging</addtitle><date>2024-01-01</date><risdate>2024</risdate><volume>43</volume><issue>1</issue><spage>1</spage><epage>1</epage><pages>1-1</pages><issn>0278-0062</issn><issn>1558-254X</issn><eissn>1558-254X</eissn><coden>ITMID4</coden><abstract>Deep learning has been widely used in medical image segmentation and other aspects. However, the performance of existing medical image segmentation models has been limited by the challenge of obtaining sufficient high-quality labeled data due to the prohibitive data annotation cost. To alleviate this limitation, we propose a new text-augmented medical image segmentation model LViT (Language meets Vision Transformer). In our LViT model, medical text annotation is incorporated to compensate for the quality deficiency in image data. In addition, the text information can guide to generate pseudo labels of improved quality in the semi-supervised learning. We also propose an Exponential Pseudo label Iteration mechanism (EPI) to help the Pixel-Level Attention Module (PLAM) preserve local image features in semi-supervised LViT setting. In our model, LV (Language-Vision) loss is designed to supervise the training of unlabeled images using text information directly. For evaluation, we construct three multimodal medical segmentation datasets (image + text) containing X-rays and CT images. Experimental results show that our proposed LViT has superior segmentation performance in both fully-supervised and semi-supervised setting. The code and datasets are available at https://github.com/HUANGLIZI/LViT.</abstract><cop>United States</cop><pub>IEEE</pub><pmid>37399157</pmid><doi>10.1109/TMI.2023.3291719</doi><tpages>1</tpages><orcidid>https://orcid.org/0000-0003-2657-6051</orcidid><orcidid>https://orcid.org/0000-0002-9996-6870</orcidid><orcidid>https://orcid.org/0000-0002-8033-2755</orcidid><orcidid>https://orcid.org/0009-0004-3839-0611</orcidid><orcidid>https://orcid.org/0000-0003-0622-4710</orcidid><orcidid>https://orcid.org/0000-0001-5998-7565</orcidid></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 0278-0062
ispartof	IEEE transactions on medical imaging, 2024-01, Vol.43 (1), p.1-1
issn	0278-0062 1558-254X 1558-254X
language	eng
recordid	cdi_proquest_miscellaneous_2833023014
source	IEEE Electronic Library (IEL)
subjects	Biomedical imaging Computed tomography Convolutional neural networks Data models Datasets Deep learning Feature extraction Image annotation Image processing Image Processing, Computer-Assisted Image quality Image segmentation Iterative methods Labels Language Medical image segmentation Medical imaging Semi-supervised learning Supervised Machine Learning Transformers Vision Vision-Language Visualization
title	LViT: Language meets Vision Transformer in Medical Image Segmentation
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-26T12%3A41%3A30IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=LViT:%20Language%20meets%20Vision%20Transformer%20in%20Medical%20Image%20Segmentation&rft.jtitle=IEEE%20transactions%20on%20medical%20imaging&rft.au=Li,%20Zihan&rft.date=2024-01-01&rft.volume=43&rft.issue=1&rft.spage=1&rft.epage=1&rft.pages=1-1&rft.issn=0278-0062&rft.eissn=1558-254X&rft.coden=ITMID4&rft_id=info:doi/10.1109/TMI.2023.3291719&rft_dat=%3Cproquest_RIE%3E2909271318%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2909271318&rft_id=info:pmid/37399157&rft_ieee_id=10172039&rfr_iscdi=true