SMTF: Sparse transformer with multiscale contextual fusion for medical image segmentation

•A new sparse Transformer with multiscale contextual fusion is proposed for medical image segmentation.•SMTF combines convolutional operations and attention mechanisms to capture both the local and global information.•A novel sparse attention is design to prevent the information redundancy and reduc...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Biomedical signal processing and control 2024-01, Vol.87, p.105458, Article 105458
Hauptverfasser: Zhang, Xichu, Zhang, Xiaozhi, Ouyang, Lijun, Qin, Chuanbo, Xiao, Lin, Xiong, Dongping
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•A new sparse Transformer with multiscale contextual fusion is proposed for medical image segmentation.•SMTF combines convolutional operations and attention mechanisms to capture both the local and global information.•A novel sparse attention is design to prevent the information redundancy and reduce the computational complexity.•A deep supervision strategy is introduced to effectively propagate the feature information across layers, as well as preserve more input spatial information and mitigate information attenuation.•SMTF boosts the segmentation performance while being more robust and efficient. Medical image segmentation aims at recognizing the object of interest from surrounding tissues and structures, which is essential for the reliable diagnosis and morphological analysis of specific lesions. Automatic medical image segmentation has been significantly boosted by deep Convolutional Neural Networks (CNNs). However, CNNs usually fail to model long-range interactions due to the intrinsic locality of convolutional operations, which limits the segmentation performance. Recently, Transformer has been successfully applied in various computer visions, which leverages the self-attention mechanism for modelling long-range interactions to capture global information. Nevertheless, self-attention suffers from lacks of spatial locality and efficient computation. To address these issues, in this work, we develop a new sparse medical Transformer (SMTF) with multiscale contextual fusion for medical image segmentation. The proposed model combines convolutional operations and attention mechanisms to form a U-shaped framework to capture both local and global information. Specifically, to reduce the computational cost of traditional Transformer, we design a novel sparse attention module to construct Transformer layers by spherical Locality Sensitive Hashing method. The sparse attention partitions the feature space into different attention buckets, and the attention calculation is conducted only in the individual bucket. The designed sparse Transformer layer further incorporates a bottleneck block to construct the encoder in SMTF. It is worth noting that the proposed sparse Transformer can also aggregate the global feature information in early stages, which enables the model to learn more local and global information by incorporating CNNs at lower layers. Furthermore, we introduce a deep supervision strategy to guide the model to fuse multiscale feature information. It f
ISSN:1746-8094
1746-8108
DOI:10.1016/j.bspc.2023.105458