MFUR-Net: Multimodal feature fusion and unimodal feature refinement for RGB-D salient object detection

RGB-D salient object detection aims to integrate multimodal feature information for accurate salient region localization. Despite the development of several RGB-D salient object detection models, existing methods face challenges in effectively fusing RGB with Depth features to exploit their compleme...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Knowledge-based systems 2024-09, Vol.299, p.112022, Article 112022
Hauptverfasser:	Feng, Zhengqian, Wang, Wei, Li, Wang, Li, Gang, Li, Min, Zhou, Mingle
Format:	Artikel
Sprache:	eng
Schlagworte:	Multimodal feature fusion RGB-D salient object detection Unimodal feature refinement
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	RGB-D salient object detection aims to integrate multimodal feature information for accurate salient region localization. Despite the development of several RGB-D salient object detection models, existing methods face challenges in effectively fusing RGB with Depth features to exploit their complementary aspects. To address this challenge, this study introduces MFUR-Net, a network based on multimodal feature fusion and unimodal feature refinement. The contributions of this study are primarily threefold: First, a multimodal multilevel feature fusion module is proposed at the encoder stage to integrate multimodal and multilevel features, generating enhanced RGB-D features; Second, a multi-input feature aggregation module at the decoder stage is introduced, which incorporates the RGB and Depth feature streams into the RGB-D feature streams so that they collaborate with the RGB-D features to learn more discriminative information related to the salient object; Third, a unimodal saliency feature refinement module is introduced to refine saliency feature information across modalities and eliminate redundancy before the integration of feature streams into the decoder; With the gradual refinement of saliency features, MFUR-Net achieves accurate saliency map prediction at the decoder stage. This method has been validated through extensive experiments on seven recognized datasets, demonstrating significant advantages over existing state-of-the-art techniques in key performance metrics. The source code can be found in https://github.com/wangwei678/MFUR-Net. •MFUR-Net proposed for RGB-D saliency detection with novel mechanisms.•Multimodal feature fusion module fuses RGB and depth for new feature RGBD.•Unimodal feature refinement module refines features, reduces redundancy.•Multi-Input Feature Aggregation module aggregates RGB, depth, and RGBD features.•MFUR-Net surpasses state-of-the-art on seven datasets.
ISSN:	0950-7051 1872-7409
DOI:	10.1016/j.knosys.2024.112022