Bidirectional Attentional Interaction Networks for RGB-D salient object detection

•BAINet achieves bidirectional interaction of cross-modality features.•BAIM captures the complementary cues of different modality data.•CGAM completes the effective integration of upper and lower level information.•BAINet gains great improvements in challenging scenarios. Aiming at the issues of ins...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Image and vision computing 2023-10, Vol.138, p.104792, Article 104792
Hauptverfasser: Wei, Weiyi, Xu, Mengyu, Wang, Jian, Luo, Xuzhe
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•BAINet achieves bidirectional interaction of cross-modality features.•BAIM captures the complementary cues of different modality data.•CGAM completes the effective integration of upper and lower level information.•BAINet gains great improvements in challenging scenarios. Aiming at the issues of insufficient cross-modality feature interaction and ineffective utilization of cross-modality data in RGB-D saliency object detection (SOD) tasks, we propose a Bidirectional Attentional Interaction Network (BAINet) for RGB-D SOD, which employs an encoder-decoder structure for bidirectional interaction of cross-modality features through a dual-branch progressive fusion approach. To begin with, based on the fact that RGB and depth information streams can complement each other, the bidirectional attention interaction module accomplishes bidirectional interaction between cross-modality features by capturing complementary cues from different modality data. In order to enhance the expressiveness of the fused RGB-D features, the global feature perception module endows the features with rich multi-scale contextual semantic information by enlarging the field of perception. In addition, exploring the correlation of cross-level features is vital to achieve accurate salient inference. Specifically, We introduce a cross-level guidance aggregation module to capture inter-layer dependencies and complete the integration of cross-level features, which effectively suppresses shallow cross-modality features and refines the saliency map during decoding. To improve the model training speed, a hybrid loss function is employed to train multi-branch saliency inference maps simultaneously. Extensive experiments on five publicly available datasets clearly show that the proposed model outperforms 18 state-of-the-art methods.
ISSN:0262-8856
DOI:10.1016/j.imavis.2023.104792