Weighted Guided Optional Fusion Network for RGB-T Salient Object Detection

There is no doubt that the rational and effective use of visible and thermal infrared image data information to achieve cross-modal complementary fusion is the key to improving the performance of RGB-T salient object detection (SOD). A meticulous analysis of the RGB-T SOD data reveals that it mainly...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:ACM transactions on multimedia computing communications and applications 2024-01, Vol.20 (5), p.1-20, Article 136
Hauptverfasser: Wang, Jie, Li, Guoqiang, Shi, Jie, Xi, Jinwen
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:There is no doubt that the rational and effective use of visible and thermal infrared image data information to achieve cross-modal complementary fusion is the key to improving the performance of RGB-T salient object detection (SOD). A meticulous analysis of the RGB-T SOD data reveals that it mainly consists of three scenarios in which both modalities (RGB and T) have a significant foreground and only a single modality (RGB or T) is disturbed. However, existing methods are obsessed with pursuing more effective cross-modal fusion based on treating both modalities equally. Obviously, the subjective use of equivalence has two significant limitations. Firstly, it does not allow for practical discrimination of which modality makes the dominant contribution to performance. While both modalities may have visually significant foregrounds, differences in their imaging properties will result in distinct performance contributions. Secondly, in a specific acquisition scenario, a pair of images with two modalities will contribute differently to the final detection performance due to their varying sensitivity to the same background interference. Intelligibly, for the RGB-T saliency detection task, it would be more reasonable to generate exclusive weights for the two modalities and select specific fusion mechanisms based on different weight configurations to perform cross-modal complementary integration. Consequently, we propose a weighted guided optional fusion network (WGOFNet) for RGB-T SOD. Specifically, a feature refinement module is first used to perform an initial refinement of the extracted multilevel features. Subsequently, a weight generation module (WGM) will generate exclusive network performance contribution weights for each of the two modalities, and an optional fusion module (OFM) will rely on this weight to perform particular integration of cross-modal information. Simple cross-level fusion is finally utilized to obtain the final saliency prediction map. Comprehensive experiments on three publicly available benchmark datasets demonstrate the proposed WGOFNet achieves superior performance compared with the state-of-the-art RGB-T SOD methods. The source code is available at: https://github.com/WJ-CV/WGOFNet.
ISSN:1551-6857
1551-6865
DOI:10.1145/3624984