Co-Enhancement of Multi-Modality Image Fusion and Object Detection via Feature Adaptation

The integration of multi-modality images significantly enhances the clarity of critical details for object detection. Valuable semantic data from object detection enriches the fusion process of these images. However, the potential reciprocal relationship that could enhance their mutual performance r...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on circuits and systems for video technology 2024-12, Vol.34 (12), p.12624-12637
Hauptverfasser:	Dong, Aimei, Wang, Long, Liu, Jian, Xu, Jingyuan, Zhao, Guixin, Zhai, Yi, Lv, Guohua, Cheng, Jinyong
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer vision Effectiveness feature adaptation Feature extraction Image fusion Modules mutual promotion Object detection Object recognition Semantics Task analysis Visual perception Visualization
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The integration of multi-modality images significantly enhances the clarity of critical details for object detection. Valuable semantic data from object detection enriches the fusion process of these images. However, the potential reciprocal relationship that could enhance their mutual performance remains largely unexplored and underutilized, despite some semantic-driven fusion methodologies catering to specific application needs. To address these limitations, this study proposes a mutually reinforcing, dual-task-driven fusion architecture. Specifically, our design integrates a feature-adaptive interlinking module into both image fusion and object detection components, effectively managing the inherent feature discrepancies. The core idea is to channel distinct features from both tasks into a unified feature space after feature transformation. We then design a feature-adaptive selection module to generate features rich in target semantic information and compatible with the fusion network. Finally, effective combination and mutual enhancement of the two tasks are achieved through an alternating training process. A diverse range of swift evaluations is performed across various datasets to corroborate the potential efficiency of our framework, actualizing visible advancements in both fusion effectiveness and detection accuracy.
ISSN:	1051-8215 1558-2205
DOI:	10.1109/TCSVT.2024.3433555