CCAFusion: Cross-Modal Coordinate Attention Network for Infrared and Visible Image Fusion

Infrared and visible image fusion aims to generate one image with comprehensive information. It can maintain rich texture characteristics and thermal information. However, for existing image fusion methods, the fused images either sacrifice the salience of thermal targets and the richness of texture...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on circuits and systems for video technology 2024-02, Vol.34 (2), p.866-881
Hauptverfasser:	Li, Xiaoling, Li, Yanfeng, Chen, Houjin, Peng, Yahui, Pan, Pan
Format:	Artikel
Sprache:	eng
Schlagworte:	Attention attention mechanism Computer vision coordinate attention cross-modal fusion strategy Decoding Dictionaries Feature extraction Generative adversarial networks Image fusion Infrared and visible image fusion Infrared imagery Modules multiple constrained loss function Object recognition Salience Task analysis Transforms
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Infrared and visible image fusion aims to generate one image with comprehensive information. It can maintain rich texture characteristics and thermal information. However, for existing image fusion methods, the fused images either sacrifice the salience of thermal targets and the richness of textures or introduce the interference of useless information like artifacts. To alleviate these problems, an effective cross-modal coordinate attention network for infrared and visible image fusion called CCAFusion is proposed in this paper. To fully integrate complementary features, the cross-modal image fusion strategy based on coordinate attention is designed, which consists of the feature-awareness fusion module and the feature-enhancement fusion module. Moreover, a multiscale skip connection-based network is employed to obtain multiscale features in the infrared image and the visible image, which can fully utilize the multi-level information in the fusion process. To reduce the discrepancy between the fused image and the input images, a multiple constrained loss function including the base loss and the auxiliary loss is developed to adjust the gray-level distribution and ensure the harmonious coexistence of structure and intensity in fused images, thereby preventing the pollution of useless information like artifacts. Extensive experiments conducted on widely used datasets demonstrate that our CCAFusion achieves superior performance over state-of-the-art image fusion methods in both qualitative evaluation and quantitative measurement. Furthermore, the application to salient object detection reveals the potential of our CCAFusion for high-level vision tasks, which can effectively boost the detection performance.
ISSN:	1051-8215 1558-2205
DOI:	10.1109/TCSVT.2023.3293228