DCMNet: Discriminant and cross-modality network for RGB-D salient object detection

It is well-acknowledged that depth maps contain affluent spatial information which is crucial to explicitly distinguish the foreground and background in Salient Object Detection (SOD). With the help of depth maps, the performance has been pushed to the peak in SOD. Nevertheless, some depth maps with...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Expert systems with applications 2023-03, Vol.214, p.119047, Article 119047
Hauptverfasser: Wang, Fasheng, Wang, Ruimin, Sun, Fuming
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:It is well-acknowledged that depth maps contain affluent spatial information which is crucial to explicitly distinguish the foreground and background in Salient Object Detection (SOD). With the help of depth maps, the performance has been pushed to the peak in SOD. Nevertheless, some depth maps with low quality are not potent for capturing accurate spatial information. Hence, it is not desirable to utilize depth maps indiscriminately. To this end, we propose a Discriminant and Cross-Modality Network (DCMNet) for RGB-D salient object detection. In DCMNet, we integrate a module named Depth Decomposition and Recomposition Module (DDRM) to filter depth maps with low quality. Thereafter, we conduct a quality enhancement procedure towards these detrimental depth maps. Meanwhile, we propose a Multi-Cross Attention Module (MCAM), which combines spatial attention with channel attention in a multi-cross way for better exploiting rich details about the salient object from RGB-stream and depth-stream. In addition, we employ Res2Net model to efficiently excavate foreground information and it is named as Image Pretraining Model (IPM). By embedding DDRM, MCAM and IPM, the accuracy has increased by a large margin. Extensive experiments manifest our proposed approach (DCMNet) outperforms the other 14 state-of-the-art methods on five challenging public datasets. •We propose Depth Decomposition and Recomposition Module for depth restoration.•We propose Multi-Cross Attention Module for better exploiting attention mechanism.•The MCAM exploits multiple features to better locate salient object.•We adopt Res2Net as backbone to extract more detailed features.
ISSN:0957-4174
1873-6793
DOI:10.1016/j.eswa.2022.119047