Are Vision xLSTM Embedded UNet More Reliable in Medical 3D Image Segmentation?

The development of efficient segmentation strategies for medical images has evolved from its initial dependence on Convolutional Neural Networks (CNNs) to the current investigation of hybrid models that combine CNNs with Vision Transformers. There is an increasing focus on creating architectures tha...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:arXiv.org 2024-12
Hauptverfasser: Dutta, Pallabi, Bose, Soham, Roy, Swalpa Kumar, Mitra, Sushmita
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Dutta, Pallabi
Bose, Soham
Roy, Swalpa Kumar
Mitra, Sushmita
description The development of efficient segmentation strategies for medical images has evolved from its initial dependence on Convolutional Neural Networks (CNNs) to the current investigation of hybrid models that combine CNNs with Vision Transformers. There is an increasing focus on creating architectures that are both high-performance and computationally efficient, able to be deployed on remote systems with limited resources. Although transformers can capture global dependencies in the input space, they face challenges from the corresponding high computational and storage expenses involved. This paper investigates the integration of CNNs with Vision Extended Long Short-Term Memory (Vision-xLSTM)s by introducing the novel {\it \textbf{U-VixLSTM}}. The Vision-xLSTM blocks capture temporal and global relationships within the patches, as extracted from the CNN feature maps. The convolutional feature reconstruction path upsamples the output volume from the Vision-xLSTM blocks, to produce the segmentation output. Our primary objective is to propose that Vision-xLSTM forms an appropriate backbone for medical image segmentation, offering excellent performance with reduced computational costs. The U-VixLSTM exhibits superior performance, compared to the state-of-the-art networks in the publicly available Synapse, ISIC and ACDC datasets. Code provided: https://github.com/duttapallabi2907/U-VixLSTM
format Article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_3072355200</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3072355200</sourcerecordid><originalsourceid>FETCH-proquest_journals_30723552003</originalsourceid><addsrcrecordid>eNqNjEELgjAYQEcQJOV_-KCzsLaW3SLKKMgOaV1lti-ZzK2cQj8_D_2ATu_wHm9EAsb5IlovGZuQ0PuaUspWMROCB-SybRHu2mtn4XPO8hSSpkSlUMHtgh2kbvBXNFqWBkFbSFHphzTA93BqZIWQYdWg7WQ3LDYzMn5K4zH8cUrmhyTfHaNX6949-q6oXd_aQRWcxowLwSjl_1Vfrqw79w</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3072355200</pqid></control><display><type>article</type><title>Are Vision xLSTM Embedded UNet More Reliable in Medical 3D Image Segmentation?</title><source>Free E- Journals</source><creator>Dutta, Pallabi ; Bose, Soham ; Roy, Swalpa Kumar ; Mitra, Sushmita</creator><creatorcontrib>Dutta, Pallabi ; Bose, Soham ; Roy, Swalpa Kumar ; Mitra, Sushmita</creatorcontrib><description>The development of efficient segmentation strategies for medical images has evolved from its initial dependence on Convolutional Neural Networks (CNNs) to the current investigation of hybrid models that combine CNNs with Vision Transformers. There is an increasing focus on creating architectures that are both high-performance and computationally efficient, able to be deployed on remote systems with limited resources. Although transformers can capture global dependencies in the input space, they face challenges from the corresponding high computational and storage expenses involved. This paper investigates the integration of CNNs with Vision Extended Long Short-Term Memory (Vision-xLSTM)s by introducing the novel {\it \textbf{U-VixLSTM}}. The Vision-xLSTM blocks capture temporal and global relationships within the patches, as extracted from the CNN feature maps. The convolutional feature reconstruction path upsamples the output volume from the Vision-xLSTM blocks, to produce the segmentation output. Our primary objective is to propose that Vision-xLSTM forms an appropriate backbone for medical image segmentation, offering excellent performance with reduced computational costs. The U-VixLSTM exhibits superior performance, compared to the state-of-the-art networks in the publicly available Synapse, ISIC and ACDC datasets. Code provided: https://github.com/duttapallabi2907/U-VixLSTM</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Artificial neural networks ; Feature maps ; Image reconstruction ; Image segmentation ; Medical imaging ; Task complexity</subject><ispartof>arXiv.org, 2024-12</ispartof><rights>2024. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>780,784</link.rule.ids></links><search><creatorcontrib>Dutta, Pallabi</creatorcontrib><creatorcontrib>Bose, Soham</creatorcontrib><creatorcontrib>Roy, Swalpa Kumar</creatorcontrib><creatorcontrib>Mitra, Sushmita</creatorcontrib><title>Are Vision xLSTM Embedded UNet More Reliable in Medical 3D Image Segmentation?</title><title>arXiv.org</title><description>The development of efficient segmentation strategies for medical images has evolved from its initial dependence on Convolutional Neural Networks (CNNs) to the current investigation of hybrid models that combine CNNs with Vision Transformers. There is an increasing focus on creating architectures that are both high-performance and computationally efficient, able to be deployed on remote systems with limited resources. Although transformers can capture global dependencies in the input space, they face challenges from the corresponding high computational and storage expenses involved. This paper investigates the integration of CNNs with Vision Extended Long Short-Term Memory (Vision-xLSTM)s by introducing the novel {\it \textbf{U-VixLSTM}}. The Vision-xLSTM blocks capture temporal and global relationships within the patches, as extracted from the CNN feature maps. The convolutional feature reconstruction path upsamples the output volume from the Vision-xLSTM blocks, to produce the segmentation output. Our primary objective is to propose that Vision-xLSTM forms an appropriate backbone for medical image segmentation, offering excellent performance with reduced computational costs. The U-VixLSTM exhibits superior performance, compared to the state-of-the-art networks in the publicly available Synapse, ISIC and ACDC datasets. Code provided: https://github.com/duttapallabi2907/U-VixLSTM</description><subject>Artificial neural networks</subject><subject>Feature maps</subject><subject>Image reconstruction</subject><subject>Image segmentation</subject><subject>Medical imaging</subject><subject>Task complexity</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNqNjEELgjAYQEcQJOV_-KCzsLaW3SLKKMgOaV1lti-ZzK2cQj8_D_2ATu_wHm9EAsb5IlovGZuQ0PuaUspWMROCB-SybRHu2mtn4XPO8hSSpkSlUMHtgh2kbvBXNFqWBkFbSFHphzTA93BqZIWQYdWg7WQ3LDYzMn5K4zH8cUrmhyTfHaNX6949-q6oXd_aQRWcxowLwSjl_1Vfrqw79w</recordid><startdate>20241218</startdate><enddate>20241218</enddate><creator>Dutta, Pallabi</creator><creator>Bose, Soham</creator><creator>Roy, Swalpa Kumar</creator><creator>Mitra, Sushmita</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20241218</creationdate><title>Are Vision xLSTM Embedded UNet More Reliable in Medical 3D Image Segmentation?</title><author>Dutta, Pallabi ; Bose, Soham ; Roy, Swalpa Kumar ; Mitra, Sushmita</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_30723552003</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Artificial neural networks</topic><topic>Feature maps</topic><topic>Image reconstruction</topic><topic>Image segmentation</topic><topic>Medical imaging</topic><topic>Task complexity</topic><toplevel>online_resources</toplevel><creatorcontrib>Dutta, Pallabi</creatorcontrib><creatorcontrib>Bose, Soham</creatorcontrib><creatorcontrib>Roy, Swalpa Kumar</creatorcontrib><creatorcontrib>Mitra, Sushmita</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Dutta, Pallabi</au><au>Bose, Soham</au><au>Roy, Swalpa Kumar</au><au>Mitra, Sushmita</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Are Vision xLSTM Embedded UNet More Reliable in Medical 3D Image Segmentation?</atitle><jtitle>arXiv.org</jtitle><date>2024-12-18</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>The development of efficient segmentation strategies for medical images has evolved from its initial dependence on Convolutional Neural Networks (CNNs) to the current investigation of hybrid models that combine CNNs with Vision Transformers. There is an increasing focus on creating architectures that are both high-performance and computationally efficient, able to be deployed on remote systems with limited resources. Although transformers can capture global dependencies in the input space, they face challenges from the corresponding high computational and storage expenses involved. This paper investigates the integration of CNNs with Vision Extended Long Short-Term Memory (Vision-xLSTM)s by introducing the novel {\it \textbf{U-VixLSTM}}. The Vision-xLSTM blocks capture temporal and global relationships within the patches, as extracted from the CNN feature maps. The convolutional feature reconstruction path upsamples the output volume from the Vision-xLSTM blocks, to produce the segmentation output. Our primary objective is to propose that Vision-xLSTM forms an appropriate backbone for medical image segmentation, offering excellent performance with reduced computational costs. The U-VixLSTM exhibits superior performance, compared to the state-of-the-art networks in the publicly available Synapse, ISIC and ACDC datasets. Code provided: https://github.com/duttapallabi2907/U-VixLSTM</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2024-12
issn 2331-8422
language eng
recordid cdi_proquest_journals_3072355200
source Free E- Journals
subjects Artificial neural networks
Feature maps
Image reconstruction
Image segmentation
Medical imaging
Task complexity
title Are Vision xLSTM Embedded UNet More Reliable in Medical 3D Image Segmentation?
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-05T18%3A16%3A13IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Are%20Vision%20xLSTM%20Embedded%20UNet%20More%20Reliable%20in%20Medical%203D%20Image%20Segmentation?&rft.jtitle=arXiv.org&rft.au=Dutta,%20Pallabi&rft.date=2024-12-18&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E3072355200%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3072355200&rft_id=info:pmid/&rfr_iscdi=true