MRSCFusion: Joint Residual Swin Transformer and Multiscale CNN for Unsupervised Multimodal Medical Image Fusion

It is crucial to integrate the complementary information of multimodal medical images for enhancing the image quality in clinical diagnosis. Convolutional neural network (CNN)-based deep-learning methods have been widely used for image fusion due to their strong modeling ability; however, CNNs fail...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on instrumentation and measurement 2023, Vol.72, p.1-17
Hauptverfasser:	Xie, Xinyu, Zhang, Xiaozhi, Ye, Shengcheng, Xiong, Dongping, Ouyang, Lijun, Yang, Bin, Zhou, Hong, Wan, Yaping
Format:	Artikel
Sprache:	eng
Schlagworte:	Artificial neural networks Computer vision End-to-end network Feature extraction Image enhancement Image fusion Image quality Image reconstruction Machine learning Magnetic resonance imaging Medical diagnostic imaging Medical imaging Modules multimodal medical images Positron emission Swin Transformer Transformers Transforms unsupervised learning Visualization
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	It is crucial to integrate the complementary information of multimodal medical images for enhancing the image quality in clinical diagnosis. Convolutional neural network (CNN)-based deep-learning methods have been widely used for image fusion due to their strong modeling ability; however, CNNs fail to build the long-range dependencies in an image, which limits the fusion performance. To address this issue, in this work, we develop a new unsupervised multimodal medical image fusion framework that combines the Swin Transformer and CNN. The proposed model follows a two-stage training strategy, where an autoencoder is trained to extract multiple deep features and reconstruct fused images. A novel residual Swin-Convolution fusion (RSCF) module is designed to fuse the multiscale features. Specifically, it consists of a global residual Swin Transformer branch for capturing the global contextual information, as well as a local gradient residual dense branch for capturing the local fine-grained information. To further effectively integrate more meaningful information and ensure the visual quality of fused images, we define a joint loss function, including content loss, and intensity loss to constrain the RSCF fusion module; moreover, we introduce an adaptive weight block (AWB) to assign learnable weights in the loss function, which can control the information preservation degree of source images. In such cases, abundant texture features from magnetic resonance imaging (MRI) images and appropriate intensity information from functional images can be well preserved simultaneously. Extensive comparisons have been conducted between the proposed model and other state-of-the-art fusion methods on CT-MRI, PET-MRI, and SPECT-MRI image fusion tasks. Both qualitative and quantitative comparisons have demonstrated the superiority of our model.
ISSN:	0018-9456 1557-9662
DOI:	10.1109/TIM.2023.3317470