BRAU-Net++: U-Shaped Hybrid CNN-Transformer Network for Medical Image Segmentation
Accurate medical image segmentation is essential for clinical quantification, disease diagnosis, treatment planning and many other applications. Both convolution-based and transformer-based u-shaped architectures have made significant success in various medical image segmentation tasks. The former c...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Accurate medical image segmentation is essential for clinical quantification,
disease diagnosis, treatment planning and many other applications. Both
convolution-based and transformer-based u-shaped architectures have made
significant success in various medical image segmentation tasks. The former can
efficiently learn local information of images while requiring much more
image-specific inductive biases inherent to convolution operation. The latter
can effectively capture long-range dependency at different feature scales using
self-attention, whereas it typically encounters the challenges of quadratic
compute and memory requirements with sequence length increasing. To address
this problem, through integrating the merits of these two paradigms in a
well-designed u-shaped architecture, we propose a hybrid yet effective
CNN-Transformer network, named BRAU-Net++, for an accurate medical image
segmentation task. Specifically, BRAU-Net++ uses bi-level routing attention as
the core building block to design our u-shaped encoder-decoder structure, in
which both encoder and decoder are hierarchically constructed, so as to learn
global semantic information while reducing computational complexity.
Furthermore, this network restructures skip connection by incorporating
channel-spatial attention which adopts convolution operations, aiming to
minimize local spatial information loss and amplify global
dimension-interaction of multi-scale features. Extensive experiments on three
public benchmark datasets demonstrate that our proposed approach surpasses
other state-of-the-art methods including its baseline: BRAU-Net under almost
all evaluation metrics. We achieve the average Dice-Similarity Coefficient
(DSC) of 82.47, 90.10, and 92.94 on Synapse multi-organ segmentation, ISIC-2018
Challenge, and CVC-ClinicDB, as well as the mIoU of 84.01 and 88.17 on
ISIC-2018 Challenge and CVC-ClinicDB, respectively. |
---|---|
DOI: | 10.48550/arxiv.2401.00722 |