xLSTM-UNet can be an Effective 2D & 3D Medical Image Segmentation Backbone with Vision-LSTM (ViL) better than its Mamba Counterpart
Convolutional Neural Networks (CNNs) and Vision Transformers (ViT) have been pivotal in biomedical image segmentation, yet their ability to manage long-range dependencies remains constrained by inherent locality and computational overhead. To overcome these challenges, in this technical report, we f...
Gespeichert in:
Hauptverfasser: | , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Convolutional Neural Networks (CNNs) and Vision Transformers (ViT) have been
pivotal in biomedical image segmentation, yet their ability to manage
long-range dependencies remains constrained by inherent locality and
computational overhead. To overcome these challenges, in this technical report,
we first propose xLSTM-UNet, a UNet structured deep learning neural network
that leverages Vision-LSTM (xLSTM) as its backbone for medical image
segmentation. xLSTM is a recently proposed as the successor of Long Short-Term
Memory (LSTM) networks and have demonstrated superior performance compared to
Transformers and State Space Models (SSMs) like Mamba in Neural Language
Processing (NLP) and image classification (as demonstrated in Vision-LSTM, or
ViL implementation). Here, xLSTM-UNet we designed extend the success in
biomedical image segmentation domain. By integrating the local feature
extraction strengths of convolutional layers with the long-range dependency
capturing abilities of xLSTM, xLSTM-UNet offers a robust solution for
comprehensive image analysis. We validate the efficacy of xLSTM-UNet through
experiments. Our findings demonstrate that xLSTM-UNet consistently surpasses
the performance of leading CNN-based, Transformer-based, and Mamba-based
segmentation networks in multiple datasets in biomedical segmentation including
organs in abdomen MRI, instruments in endoscopic images, and cells in
microscopic images. With comprehensive experiments performed, this technical
report highlights the potential of xLSTM-based architectures in advancing
biomedical image analysis in both 2D and 3D. The code, models, and datasets are
publicly available at http://tianrun-chen.github.io/xLSTM-UNet/ |
---|---|
DOI: | 10.48550/arxiv.2407.01530 |