Are Vision xLSTM Embedded UNet More Reliable in Medical 3D Image Segmentation?

The development of efficient segmentation strategies for medical images has evolved from its initial dependence on Convolutional Neural Networks (CNNs) to the current investigation of hybrid models that combine CNNs with Vision Transformers. There is an increasing focus on creating architectures tha...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2024-12
Hauptverfasser:	Dutta, Pallabi, Bose, Soham, Roy, Swalpa Kumar, Mitra, Sushmita
Format:	Artikel
Sprache:	eng
Schlagworte:	Artificial neural networks Feature maps Image reconstruction Image segmentation Medical imaging Task complexity
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Dutta, Pallabi Bose, Soham Roy, Swalpa Kumar Mitra, Sushmita
description	The development of efficient segmentation strategies for medical images has evolved from its initial dependence on Convolutional Neural Networks (CNNs) to the current investigation of hybrid models that combine CNNs with Vision Transformers. There is an increasing focus on creating architectures that are both high-performance and computationally efficient, able to be deployed on remote systems with limited resources. Although transformers can capture global dependencies in the input space, they face challenges from the corresponding high computational and storage expenses involved. This paper investigates the integration of CNNs with Vision Extended Long Short-Term Memory (Vision-xLSTM)s by introducing the novel {\it \textbf{U-VixLSTM}}. The Vision-xLSTM blocks capture temporal and global relationships within the patches, as extracted from the CNN feature maps. The convolutional feature reconstruction path upsamples the output volume from the Vision-xLSTM blocks, to produce the segmentation output. Our primary objective is to propose that Vision-xLSTM forms an appropriate backbone for medical image segmentation, offering excellent performance with reduced computational costs. The U-VixLSTM exhibits superior performance, compared to the state-of-the-art networks in the publicly available Synapse, ISIC and ACDC datasets. Code provided: https://github.com/duttapallabi2907/U-VixLSTM
format	Article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_3072355200</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3072355200</sourcerecordid><originalsourceid>FETCH-proquest_journals_30723552003</originalsourceid><addsrcrecordid>eNqNjEELgjAYQEcQJOV_-KCzsLaW3SLKKMgOaV1lti-ZzK2cQj8_D_2ATu_wHm9EAsb5IlovGZuQ0PuaUspWMROCB-SybRHu2mtn4XPO8hSSpkSlUMHtgh2kbvBXNFqWBkFbSFHphzTA93BqZIWQYdWg7WQ3LDYzMn5K4zH8cUrmhyTfHaNX6949-q6oXd_aQRWcxowLwSjl_1Vfrqw79w</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3072355200</pqid></control><display><type>article</type><title>Are Vision xLSTM Embedded UNet More Reliable in Medical 3D Image Segmentation?</title><source>Free E- Journals</source><creator>Dutta, Pallabi ; Bose, Soham ; Roy, Swalpa Kumar ; Mitra, Sushmita</creator><creatorcontrib>Dutta, Pallabi ; Bose, Soham ; Roy, Swalpa Kumar ; Mitra, Sushmita</creatorcontrib><description>The development of efficient segmentation strategies for medical images has evolved from its initial dependence on Convolutional Neural Networks (CNNs) to the current investigation of hybrid models that combine CNNs with Vision Transformers. There is an increasing focus on creating architectures that are both high-performance and computationally efficient, able to be deployed on remote systems with limited resources. Although transformers can capture global dependencies in the input space, they face challenges from the corresponding high computational and storage expenses involved. This paper investigates the integration of CNNs with Vision Extended Long Short-Term Memory (Vision-xLSTM)s by introducing the novel {\it \textbf{U-VixLSTM}}. The Vision-xLSTM blocks capture temporal and global relationships within the patches, as extracted from the CNN feature maps. The convolutional feature reconstruction path upsamples the output volume from the Vision-xLSTM blocks, to produce the segmentation output. Our primary objective is to propose that Vision-xLSTM forms an appropriate backbone for medical image segmentation, offering excellent performance with reduced computational costs. The U-VixLSTM exhibits superior performance, compared to the state-of-the-art networks in the publicly available Synapse, ISIC and ACDC datasets. Code provided: https://github.com/duttapallabi2907/U-VixLSTM</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Artificial neural networks ; Feature maps ; Image reconstruction ; Image segmentation ; Medical imaging ; Task complexity</subject><ispartof>arXiv.org, 2024-12</ispartof><rights>2024. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>780,784</link.rule.ids></links><search><creatorcontrib>Dutta, Pallabi</creatorcontrib><creatorcontrib>Bose, Soham</creatorcontrib><creatorcontrib>Roy, Swalpa Kumar</creatorcontrib><creatorcontrib>Mitra, Sushmita</creatorcontrib><title>Are Vision xLSTM Embedded UNet More Reliable in Medical 3D Image Segmentation?</title><title>arXiv.org</title><description>The development of efficient segmentation strategies for medical images has evolved from its initial dependence on Convolutional Neural Networks (CNNs) to the current investigation of hybrid models that combine CNNs with Vision Transformers. There is an increasing focus on creating architectures that are both high-performance and computationally efficient, able to be deployed on remote systems with limited resources. Although transformers can capture global dependencies in the input space, they face challenges from the corresponding high computational and storage expenses involved. This paper investigates the integration of CNNs with Vision Extended Long Short-Term Memory (Vision-xLSTM)s by introducing the novel {\it \textbf{U-VixLSTM}}. The Vision-xLSTM blocks capture temporal and global relationships within the patches, as extracted from the CNN feature maps. The convolutional feature reconstruction path upsamples the output volume from the Vision-xLSTM blocks, to produce the segmentation output. Our primary objective is to propose that Vision-xLSTM forms an appropriate backbone for medical image segmentation, offering excellent performance with reduced computational costs. The U-VixLSTM exhibits superior performance, compared to the state-of-the-art networks in the publicly available Synapse, ISIC and ACDC datasets. Code provided: https://github.com/duttapallabi2907/U-VixLSTM</description><subject>Artificial neural networks</subject><subject>Feature maps</subject><subject>Image reconstruction</subject><subject>Image segmentation</subject><subject>Medical imaging</subject><subject>Task complexity</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNqNjEELgjAYQEcQJOV_-KCzsLaW3SLKKMgOaV1lti-ZzK2cQj8_D_2ATu_wHm9EAsb5IlovGZuQ0PuaUspWMROCB-SybRHu2mtn4XPO8hSSpkSlUMHtgh2kbvBXNFqWBkFbSFHphzTA93BqZIWQYdWg7WQ3LDYzMn5K4zH8cUrmhyTfHaNX6949-q6oXd_aQRWcxowLwSjl_1Vfrqw79w</recordid><startdate>20241218</startdate><enddate>20241218</enddate><creator>Dutta, Pallabi</creator><creator>Bose, Soham</creator><creator>Roy, Swalpa Kumar</creator><creator>Mitra, Sushmita</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20241218</creationdate><title>Are Vision xLSTM Embedded UNet More Reliable in Medical 3D Image Segmentation?</title><author>Dutta, Pallabi ; Bose, Soham ; Roy, Swalpa Kumar ; Mitra, Sushmita</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_30723552003</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Artificial neural networks</topic><topic>Feature maps</topic><topic>Image reconstruction</topic><topic>Image segmentation</topic><topic>Medical imaging</topic><topic>Task complexity</topic><toplevel>online_resources</toplevel><creatorcontrib>Dutta, Pallabi</creatorcontrib><creatorcontrib>Bose, Soham</creatorcontrib><creatorcontrib>Roy, Swalpa Kumar</creatorcontrib><creatorcontrib>Mitra, Sushmita</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Dutta, Pallabi</au><au>Bose, Soham</au><au>Roy, Swalpa Kumar</au><au>Mitra, Sushmita</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Are Vision xLSTM Embedded UNet More Reliable in Medical 3D Image Segmentation?</atitle><jtitle>arXiv.org</jtitle><date>2024-12-18</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>The development of efficient segmentation strategies for medical images has evolved from its initial dependence on Convolutional Neural Networks (CNNs) to the current investigation of hybrid models that combine CNNs with Vision Transformers. There is an increasing focus on creating architectures that are both high-performance and computationally efficient, able to be deployed on remote systems with limited resources. Although transformers can capture global dependencies in the input space, they face challenges from the corresponding high computational and storage expenses involved. This paper investigates the integration of CNNs with Vision Extended Long Short-Term Memory (Vision-xLSTM)s by introducing the novel {\it \textbf{U-VixLSTM}}. The Vision-xLSTM blocks capture temporal and global relationships within the patches, as extracted from the CNN feature maps. The convolutional feature reconstruction path upsamples the output volume from the Vision-xLSTM blocks, to produce the segmentation output. Our primary objective is to propose that Vision-xLSTM forms an appropriate backbone for medical image segmentation, offering excellent performance with reduced computational costs. The U-VixLSTM exhibits superior performance, compared to the state-of-the-art networks in the publicly available Synapse, ISIC and ACDC datasets. Code provided: https://github.com/duttapallabi2907/U-VixLSTM</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2024-12
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_3072355200
source	Free E- Journals
subjects	Artificial neural networks Feature maps Image reconstruction Image segmentation Medical imaging Task complexity
title	Are Vision xLSTM Embedded UNet More Reliable in Medical 3D Image Segmentation?
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-05T18%3A16%3A13IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Are%20Vision%20xLSTM%20Embedded%20UNet%20More%20Reliable%20in%20Medical%203D%20Image%20Segmentation?&rft.jtitle=arXiv.org&rft.au=Dutta,%20Pallabi&rft.date=2024-12-18&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E3072355200%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3072355200&rft_id=info:pmid/&rfr_iscdi=true