LViT: Language meets Vision Transformer in Medical Image Segmentation
Deep learning has been widely used in medical image segmentation and other aspects. However, the performance of existing medical image segmentation models has been limited by the challenge of obtaining sufficient high-quality labeled data due to the prohibitive data annotation cost. To alleviate thi...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on medical imaging 2024-01, Vol.43 (1), p.1-1 |
---|---|
Hauptverfasser: | , , , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 1 |
---|---|
container_issue | 1 |
container_start_page | 1 |
container_title | IEEE transactions on medical imaging |
container_volume | 43 |
creator | Li, Zihan Li, Yunxiang Li, Qingde Wang, Puyang Guo, Dazhou Lu, Le Jin, Dakai Zhang, You Hong, Qingqi |
description | Deep learning has been widely used in medical image segmentation and other aspects. However, the performance of existing medical image segmentation models has been limited by the challenge of obtaining sufficient high-quality labeled data due to the prohibitive data annotation cost. To alleviate this limitation, we propose a new text-augmented medical image segmentation model LViT (Language meets Vision Transformer). In our LViT model, medical text annotation is incorporated to compensate for the quality deficiency in image data. In addition, the text information can guide to generate pseudo labels of improved quality in the semi-supervised learning. We also propose an Exponential Pseudo label Iteration mechanism (EPI) to help the Pixel-Level Attention Module (PLAM) preserve local image features in semi-supervised LViT setting. In our model, LV (Language-Vision) loss is designed to supervise the training of unlabeled images using text information directly. For evaluation, we construct three multimodal medical segmentation datasets (image + text) containing X-rays and CT images. Experimental results show that our proposed LViT has superior segmentation performance in both fully-supervised and semi-supervised setting. The code and datasets are available at https://github.com/HUANGLIZI/LViT. |
doi_str_mv | 10.1109/TMI.2023.3291719 |
format | Article |
fullrecord | <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_miscellaneous_2833023014</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10172039</ieee_id><sourcerecordid>2909271318</sourcerecordid><originalsourceid>FETCH-LOGICAL-c348t-712c8a61674e82a0e67c1fb8b04ed7bc507823113eff2e52a037296d16bb4fba3</originalsourceid><addsrcrecordid>eNpd0EFLwzAUwPEgipvTuweRghcvnXlJ2yTeZEwddHiwDm8hbV9Hx9pq0h789mZsinjK5fceL39CLoFOAai6y5aLKaOMTzlTIEAdkTHEsQxZHL0fkzFlQoaUJmxEzpzbUApRTNUpGXHBlYJYjMk8XdXZfZCadj2YNQYNYu-CVe3qrg0ya1pXdbZBG9RtsMSyLsw2WDQ7-YrrBtve9F6ek5PKbB1eHN4JeXucZ7PnMH15Wswe0rDgkexDAayQJoFERCiZoZiIAqpc5jTCUuRFTIVkHIBjVTGMveCCqaSEJM-jKjd8Qm73ez9s9zmg63VTuwK3W9NiNzjNJOc-h_-npzf_6KYbbOuv00xRxQRwkF7RvSps55zFSn_YujH2SwPVu8TaJ9a7xPqQ2I9cHxYPeYPl78BPUw-u9qBGxD_7QDDKFf8G0Eh9sQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2909271318</pqid></control><display><type>article</type><title>LViT: Language meets Vision Transformer in Medical Image Segmentation</title><source>IEEE Electronic Library (IEL)</source><creator>Li, Zihan ; Li, Yunxiang ; Li, Qingde ; Wang, Puyang ; Guo, Dazhou ; Lu, Le ; Jin, Dakai ; Zhang, You ; Hong, Qingqi</creator><creatorcontrib>Li, Zihan ; Li, Yunxiang ; Li, Qingde ; Wang, Puyang ; Guo, Dazhou ; Lu, Le ; Jin, Dakai ; Zhang, You ; Hong, Qingqi</creatorcontrib><description>Deep learning has been widely used in medical image segmentation and other aspects. However, the performance of existing medical image segmentation models has been limited by the challenge of obtaining sufficient high-quality labeled data due to the prohibitive data annotation cost. To alleviate this limitation, we propose a new text-augmented medical image segmentation model LViT (Language meets Vision Transformer). In our LViT model, medical text annotation is incorporated to compensate for the quality deficiency in image data. In addition, the text information can guide to generate pseudo labels of improved quality in the semi-supervised learning. We also propose an Exponential Pseudo label Iteration mechanism (EPI) to help the Pixel-Level Attention Module (PLAM) preserve local image features in semi-supervised LViT setting. In our model, LV (Language-Vision) loss is designed to supervise the training of unlabeled images using text information directly. For evaluation, we construct three multimodal medical segmentation datasets (image + text) containing X-rays and CT images. Experimental results show that our proposed LViT has superior segmentation performance in both fully-supervised and semi-supervised setting. The code and datasets are available at https://github.com/HUANGLIZI/LViT.</description><identifier>ISSN: 0278-0062</identifier><identifier>ISSN: 1558-254X</identifier><identifier>EISSN: 1558-254X</identifier><identifier>DOI: 10.1109/TMI.2023.3291719</identifier><identifier>PMID: 37399157</identifier><identifier>CODEN: ITMID4</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>Biomedical imaging ; Computed tomography ; Convolutional neural networks ; Data models ; Datasets ; Deep learning ; Feature extraction ; Image annotation ; Image processing ; Image Processing, Computer-Assisted ; Image quality ; Image segmentation ; Iterative methods ; Labels ; Language ; Medical image segmentation ; Medical imaging ; Semi-supervised learning ; Supervised Machine Learning ; Transformers ; Vision ; Vision-Language ; Visualization</subject><ispartof>IEEE transactions on medical imaging, 2024-01, Vol.43 (1), p.1-1</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c348t-712c8a61674e82a0e67c1fb8b04ed7bc507823113eff2e52a037296d16bb4fba3</citedby><cites>FETCH-LOGICAL-c348t-712c8a61674e82a0e67c1fb8b04ed7bc507823113eff2e52a037296d16bb4fba3</cites><orcidid>0000-0003-2657-6051 ; 0000-0002-9996-6870 ; 0000-0002-8033-2755 ; 0009-0004-3839-0611 ; 0000-0003-0622-4710 ; 0000-0001-5998-7565</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10172039$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10172039$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/37399157$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Li, Zihan</creatorcontrib><creatorcontrib>Li, Yunxiang</creatorcontrib><creatorcontrib>Li, Qingde</creatorcontrib><creatorcontrib>Wang, Puyang</creatorcontrib><creatorcontrib>Guo, Dazhou</creatorcontrib><creatorcontrib>Lu, Le</creatorcontrib><creatorcontrib>Jin, Dakai</creatorcontrib><creatorcontrib>Zhang, You</creatorcontrib><creatorcontrib>Hong, Qingqi</creatorcontrib><title>LViT: Language meets Vision Transformer in Medical Image Segmentation</title><title>IEEE transactions on medical imaging</title><addtitle>TMI</addtitle><addtitle>IEEE Trans Med Imaging</addtitle><description>Deep learning has been widely used in medical image segmentation and other aspects. However, the performance of existing medical image segmentation models has been limited by the challenge of obtaining sufficient high-quality labeled data due to the prohibitive data annotation cost. To alleviate this limitation, we propose a new text-augmented medical image segmentation model LViT (Language meets Vision Transformer). In our LViT model, medical text annotation is incorporated to compensate for the quality deficiency in image data. In addition, the text information can guide to generate pseudo labels of improved quality in the semi-supervised learning. We also propose an Exponential Pseudo label Iteration mechanism (EPI) to help the Pixel-Level Attention Module (PLAM) preserve local image features in semi-supervised LViT setting. In our model, LV (Language-Vision) loss is designed to supervise the training of unlabeled images using text information directly. For evaluation, we construct three multimodal medical segmentation datasets (image + text) containing X-rays and CT images. Experimental results show that our proposed LViT has superior segmentation performance in both fully-supervised and semi-supervised setting. The code and datasets are available at https://github.com/HUANGLIZI/LViT.</description><subject>Biomedical imaging</subject><subject>Computed tomography</subject><subject>Convolutional neural networks</subject><subject>Data models</subject><subject>Datasets</subject><subject>Deep learning</subject><subject>Feature extraction</subject><subject>Image annotation</subject><subject>Image processing</subject><subject>Image Processing, Computer-Assisted</subject><subject>Image quality</subject><subject>Image segmentation</subject><subject>Iterative methods</subject><subject>Labels</subject><subject>Language</subject><subject>Medical image segmentation</subject><subject>Medical imaging</subject><subject>Semi-supervised learning</subject><subject>Supervised Machine Learning</subject><subject>Transformers</subject><subject>Vision</subject><subject>Vision-Language</subject><subject>Visualization</subject><issn>0278-0062</issn><issn>1558-254X</issn><issn>1558-254X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><sourceid>EIF</sourceid><recordid>eNpd0EFLwzAUwPEgipvTuweRghcvnXlJ2yTeZEwddHiwDm8hbV9Hx9pq0h789mZsinjK5fceL39CLoFOAai6y5aLKaOMTzlTIEAdkTHEsQxZHL0fkzFlQoaUJmxEzpzbUApRTNUpGXHBlYJYjMk8XdXZfZCadj2YNQYNYu-CVe3qrg0ya1pXdbZBG9RtsMSyLsw2WDQ7-YrrBtve9F6ek5PKbB1eHN4JeXucZ7PnMH15Wswe0rDgkexDAayQJoFERCiZoZiIAqpc5jTCUuRFTIVkHIBjVTGMveCCqaSEJM-jKjd8Qm73ez9s9zmg63VTuwK3W9NiNzjNJOc-h_-npzf_6KYbbOuv00xRxQRwkF7RvSps55zFSn_YujH2SwPVu8TaJ9a7xPqQ2I9cHxYPeYPl78BPUw-u9qBGxD_7QDDKFf8G0Eh9sQ</recordid><startdate>20240101</startdate><enddate>20240101</enddate><creator>Li, Zihan</creator><creator>Li, Yunxiang</creator><creator>Li, Qingde</creator><creator>Wang, Puyang</creator><creator>Guo, Dazhou</creator><creator>Lu, Le</creator><creator>Jin, Dakai</creator><creator>Zhang, You</creator><creator>Hong, Qingqi</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QF</scope><scope>7QO</scope><scope>7QQ</scope><scope>7SC</scope><scope>7SE</scope><scope>7SP</scope><scope>7SR</scope><scope>7TA</scope><scope>7TB</scope><scope>7U5</scope><scope>8BQ</scope><scope>8FD</scope><scope>F28</scope><scope>FR3</scope><scope>H8D</scope><scope>JG9</scope><scope>JQ2</scope><scope>KR7</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>NAPCQ</scope><scope>P64</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0003-2657-6051</orcidid><orcidid>https://orcid.org/0000-0002-9996-6870</orcidid><orcidid>https://orcid.org/0000-0002-8033-2755</orcidid><orcidid>https://orcid.org/0009-0004-3839-0611</orcidid><orcidid>https://orcid.org/0000-0003-0622-4710</orcidid><orcidid>https://orcid.org/0000-0001-5998-7565</orcidid></search><sort><creationdate>20240101</creationdate><title>LViT: Language meets Vision Transformer in Medical Image Segmentation</title><author>Li, Zihan ; Li, Yunxiang ; Li, Qingde ; Wang, Puyang ; Guo, Dazhou ; Lu, Le ; Jin, Dakai ; Zhang, You ; Hong, Qingqi</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c348t-712c8a61674e82a0e67c1fb8b04ed7bc507823113eff2e52a037296d16bb4fba3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Biomedical imaging</topic><topic>Computed tomography</topic><topic>Convolutional neural networks</topic><topic>Data models</topic><topic>Datasets</topic><topic>Deep learning</topic><topic>Feature extraction</topic><topic>Image annotation</topic><topic>Image processing</topic><topic>Image Processing, Computer-Assisted</topic><topic>Image quality</topic><topic>Image segmentation</topic><topic>Iterative methods</topic><topic>Labels</topic><topic>Language</topic><topic>Medical image segmentation</topic><topic>Medical imaging</topic><topic>Semi-supervised learning</topic><topic>Supervised Machine Learning</topic><topic>Transformers</topic><topic>Vision</topic><topic>Vision-Language</topic><topic>Visualization</topic><toplevel>online_resources</toplevel><creatorcontrib>Li, Zihan</creatorcontrib><creatorcontrib>Li, Yunxiang</creatorcontrib><creatorcontrib>Li, Qingde</creatorcontrib><creatorcontrib>Wang, Puyang</creatorcontrib><creatorcontrib>Guo, Dazhou</creatorcontrib><creatorcontrib>Lu, Le</creatorcontrib><creatorcontrib>Jin, Dakai</creatorcontrib><creatorcontrib>Zhang, You</creatorcontrib><creatorcontrib>Hong, Qingqi</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Aluminium Industry Abstracts</collection><collection>Biotechnology Research Abstracts</collection><collection>Ceramic Abstracts</collection><collection>Computer and Information Systems Abstracts</collection><collection>Corrosion Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>Materials Business File</collection><collection>Mechanical & Transportation Engineering Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>ANTE: Abstracts in New Technology & Engineering</collection><collection>Engineering Research Database</collection><collection>Aerospace Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Civil Engineering Abstracts</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Nursing & Allied Health Premium</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE transactions on medical imaging</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Li, Zihan</au><au>Li, Yunxiang</au><au>Li, Qingde</au><au>Wang, Puyang</au><au>Guo, Dazhou</au><au>Lu, Le</au><au>Jin, Dakai</au><au>Zhang, You</au><au>Hong, Qingqi</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>LViT: Language meets Vision Transformer in Medical Image Segmentation</atitle><jtitle>IEEE transactions on medical imaging</jtitle><stitle>TMI</stitle><addtitle>IEEE Trans Med Imaging</addtitle><date>2024-01-01</date><risdate>2024</risdate><volume>43</volume><issue>1</issue><spage>1</spage><epage>1</epage><pages>1-1</pages><issn>0278-0062</issn><issn>1558-254X</issn><eissn>1558-254X</eissn><coden>ITMID4</coden><abstract>Deep learning has been widely used in medical image segmentation and other aspects. However, the performance of existing medical image segmentation models has been limited by the challenge of obtaining sufficient high-quality labeled data due to the prohibitive data annotation cost. To alleviate this limitation, we propose a new text-augmented medical image segmentation model LViT (Language meets Vision Transformer). In our LViT model, medical text annotation is incorporated to compensate for the quality deficiency in image data. In addition, the text information can guide to generate pseudo labels of improved quality in the semi-supervised learning. We also propose an Exponential Pseudo label Iteration mechanism (EPI) to help the Pixel-Level Attention Module (PLAM) preserve local image features in semi-supervised LViT setting. In our model, LV (Language-Vision) loss is designed to supervise the training of unlabeled images using text information directly. For evaluation, we construct three multimodal medical segmentation datasets (image + text) containing X-rays and CT images. Experimental results show that our proposed LViT has superior segmentation performance in both fully-supervised and semi-supervised setting. The code and datasets are available at https://github.com/HUANGLIZI/LViT.</abstract><cop>United States</cop><pub>IEEE</pub><pmid>37399157</pmid><doi>10.1109/TMI.2023.3291719</doi><tpages>1</tpages><orcidid>https://orcid.org/0000-0003-2657-6051</orcidid><orcidid>https://orcid.org/0000-0002-9996-6870</orcidid><orcidid>https://orcid.org/0000-0002-8033-2755</orcidid><orcidid>https://orcid.org/0009-0004-3839-0611</orcidid><orcidid>https://orcid.org/0000-0003-0622-4710</orcidid><orcidid>https://orcid.org/0000-0001-5998-7565</orcidid></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 0278-0062 |
ispartof | IEEE transactions on medical imaging, 2024-01, Vol.43 (1), p.1-1 |
issn | 0278-0062 1558-254X 1558-254X |
language | eng |
recordid | cdi_proquest_miscellaneous_2833023014 |
source | IEEE Electronic Library (IEL) |
subjects | Biomedical imaging Computed tomography Convolutional neural networks Data models Datasets Deep learning Feature extraction Image annotation Image processing Image Processing, Computer-Assisted Image quality Image segmentation Iterative methods Labels Language Medical image segmentation Medical imaging Semi-supervised learning Supervised Machine Learning Transformers Vision Vision-Language Visualization |
title | LViT: Language meets Vision Transformer in Medical Image Segmentation |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-26T12%3A41%3A30IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=LViT:%20Language%20meets%20Vision%20Transformer%20in%20Medical%20Image%20Segmentation&rft.jtitle=IEEE%20transactions%20on%20medical%20imaging&rft.au=Li,%20Zihan&rft.date=2024-01-01&rft.volume=43&rft.issue=1&rft.spage=1&rft.epage=1&rft.pages=1-1&rft.issn=0278-0062&rft.eissn=1558-254X&rft.coden=ITMID4&rft_id=info:doi/10.1109/TMI.2023.3291719&rft_dat=%3Cproquest_RIE%3E2909271318%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2909271318&rft_id=info:pmid/37399157&rft_ieee_id=10172039&rfr_iscdi=true |