Transformer-Based End-to-End Anatomical and Functional Image Fusion

Medical image fusion aims to derive complementary information from medical images with different modalities and is becoming increasingly important in clinical applications. The design of fusion strategy plays a key role in achieving high-quality fusion results. Existing methods usually employ handcr...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on instrumentation and measurement 2022, Vol.71, p.1-11
Hauptverfasser:	Zhang, Jing, Liu, Aiping, Wang, Dan, Liu, Yu, Wang, Z. Jane, Chen, Xun
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer vision End-to-end network Feature extraction Fuses Image fusion Image processing Image resolution Medical diagnostic imaging medical image fusion Medical imaging Neural networks Task analysis transformer Transformers Transfusion unsupervised learning
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Medical image fusion aims to derive complementary information from medical images with different modalities and is becoming increasingly important in clinical applications. The design of fusion strategy plays a key role in achieving high-quality fusion results. Existing methods usually employ handcrafted fusion rules or convolution-based networks to fuse multimodal medical images. However, these fusion strategies are insufficiently fine-grained and cannot capture global information of multimodal images effectively. Moreover, for deep learning-based fusion methods, they always concatenate source images/deep features of different modalities to input into the neural network, easily leading to inadequate information utilization from source images. To address these problems, we propose a transformer-based end-to-end framework for medical image fusion, termed TransFusion. The proposed TransFusion introduces the transformer as the fusion strategy, which utilizes its self-attention mechanism to incorporate the global contextual information of multimodal features and fuse them adequately. Besides, unlike traditional parallel multibranch architectures or shared networks used for multiple inputs, we design branch networks that interact through fusion transformers at multiple scales to utilize the information of different modalities more adequately. A natural advantage of our design is the ability to aggregate global multimodal features through self-attention. Both qualitative and quantitative experiments demonstrate the superiority of our method over the state-of-the-art fusion methods.
ISSN:	0018-9456 1557-9662
DOI:	10.1109/TIM.2022.3200426