MRSCFusion: Joint Residual Swin Transformer and Multiscale CNN for Unsupervised Multimodal Medical Image Fusion
It is crucial to integrate the complementary information of multimodal medical images for enhancing the image quality in clinical diagnosis. Convolutional neural network (CNN)-based deep-learning methods have been widely used for image fusion due to their strong modeling ability; however, CNNs fail...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on instrumentation and measurement 2023, Vol.72, p.1-17 |
---|---|
Hauptverfasser: | , , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | It is crucial to integrate the complementary information of multimodal medical images for enhancing the image quality in clinical diagnosis. Convolutional neural network (CNN)-based deep-learning methods have been widely used for image fusion due to their strong modeling ability; however, CNNs fail to build the long-range dependencies in an image, which limits the fusion performance. To address this issue, in this work, we develop a new unsupervised multimodal medical image fusion framework that combines the Swin Transformer and CNN. The proposed model follows a two-stage training strategy, where an autoencoder is trained to extract multiple deep features and reconstruct fused images. A novel residual Swin-Convolution fusion (RSCF) module is designed to fuse the multiscale features. Specifically, it consists of a global residual Swin Transformer branch for capturing the global contextual information, as well as a local gradient residual dense branch for capturing the local fine-grained information. To further effectively integrate more meaningful information and ensure the visual quality of fused images, we define a joint loss function, including content loss, and intensity loss to constrain the RSCF fusion module; moreover, we introduce an adaptive weight block (AWB) to assign learnable weights in the loss function, which can control the information preservation degree of source images. In such cases, abundant texture features from magnetic resonance imaging (MRI) images and appropriate intensity information from functional images can be well preserved simultaneously. Extensive comparisons have been conducted between the proposed model and other state-of-the-art fusion methods on CT-MRI, PET-MRI, and SPECT-MRI image fusion tasks. Both qualitative and quantitative comparisons have demonstrated the superiority of our model. |
---|---|
ISSN: | 0018-9456 1557-9662 |
DOI: | 10.1109/TIM.2023.3317470 |