Colliding Depths and Fusion: Leveraging Adaptive Feature Maps and Restorable Depth Recharge for Infrared and Visible Scene Fusion

Both the fusion expression of scene information from multi-modal images and pipeline of downstream tasks have become a new focus in image fusion field. Recently, most studies propose multi-task driven fusion methods. However, these methods employ specific trained multi-modal fusion parts for a certa...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on multimedia 2025, p.1-12
Hauptverfasser:	Han, Mina, Yu, Kailong, Li, Weiran, Guo, Qiannan, Li, Zhenbo
Format:	Artikel
Sprache:	eng
Schlagworte:	Depth estimation map Feature extraction Focusing Fuses Generators Image fusion Image restoration Image segmentation Multi-modal image Multi-scale enhancement Multitasking Object detection Scene fusion Semantics
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Both the fusion expression of scene information from multi-modal images and pipeline of downstream tasks have become a new focus in image fusion field. Recently, most studies propose multi-task driven fusion methods. However, these methods employ specific trained multi-modal fusion parts for a certain downstream task, ignoring the broader scenario description and application value of the fusion task itself. In order to focus on the visual perception of depth features from the fusion scenes, we design a new method (CDFGAN) based on Scene Fusion (ScF), with the multi-modal geometric depth as background. Concretely, we leverage adaptive feature maps and recoverable depth information supplement for infrared and visible image fusion. First, by devising a Successive Generating Network (SGN) based on geometric interpolation, the structural consistency of fusion scenes is enhanced. We propose an Adaptive Discriminator Network (ADN) based on Elastic Feature Mapping Module (EFMM). This reduces time consumption caused by the design of modules in generator and improves the effectiveness of the discriminator as well as the generator. Furthermore, a multi-modal Poisson loss function is proposed to align the pixel distribution of different modals, ensuring the fusion results have more similar structural information to inputs. Extensive experiments have validated that our method has advantages and applicability in multiple downstream tasks while improving fusion performance. The code for CDFGAN is available at https://github.com/nanakoMI/CDFGAN.git .
ISSN:	1520-9210 1941-0077
DOI:	10.1109/TMM.2024.3521751