CAMF: An Interpretable Infrared and Visible Image Fusion Network Based on Class Activation Mapping

Image fusion aims to integrate the complementary information of source images and synthesize a single fused image. Existing image fusion algorithms apply hand-crafted fusion rules to merge deep features which cause information loss and limit the fusion performance of methods since the uninterpretabi...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on multimedia 2024, Vol.26, p.4776-4791
Hauptverfasser:	Tang, Linfeng, Chen, Ziang, Huang, Jun, Ma, Jiayi
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms class activation mapping Classifiers Computer vision Deep learning Encoders-Decoders Feature extraction Feature maps Image fusion Image reconstruction Infrared imagery learnable fusion rule Machine learning Pollution measurement Task analysis Transforms
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Image fusion aims to integrate the complementary information of source images and synthesize a single fused image. Existing image fusion algorithms apply hand-crafted fusion rules to merge deep features which cause information loss and limit the fusion performance of methods since the uninterpretability of deep learning. To overcome the above shortcomings, we propose a learnable fusion rule for infrared and visible image fusion based on class activation mapping. Our proposed fusion rule can selectively preserve meaningful information and reduce distortion. More specifically, we first train an encoder-decoder network and an auxiliary classifier based on the shared encoder. Then, the class activation weights are taken out from the auxiliary classifier, which indicates the importance of each channel. Finally, the deep features extracted by the encoder are adaptively fused according to the class activation weights and the fused image is reconstructed from the fused features via the pre-trained decoder. Note that our learnable fusion rule can automatically measure the importance of each deep feature without human participation. Moreover, it fully preserves the significant features of source images such as salient targets and texture details. Extensive experiments manifest our superiority over state-of-the-art algorithms. Visualization of feature maps and their corresponding weights reveals the high interpretability of our method.
ISSN:	1520-9210 1941-0077
DOI:	10.1109/TMM.2023.3326296