TENet: Accurate light-field salient object detection with a transformer embedding network

Current light-field salient object detection methods have difficulty in accurately distinguishing objects from complex backgrounds. In this paper, we believe that this problem can be mitigated by optimizing feature fusion and enlarging receptive field, and thus propose a novel transformer embedding...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Image and vision computing 2023-01, Vol.129, p.104595, Article 104595
Hauptverfasser: Wang, Xingzheng, Chen, Songwei, Wei, Guoyao, Liu, Jiehao
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Current light-field salient object detection methods have difficulty in accurately distinguishing objects from complex backgrounds. In this paper, we believe that this problem can be mitigated by optimizing feature fusion and enlarging receptive field, and thus propose a novel transformer embedding network named TENet. The main idea of the network is to (1) selectively aggregate multi-features for fuller feature fusion; (2) integrate the Transformer for larger receptive field, so as to accurately identify salient objects. For the former, firstly, a multi-modal feature fusion module (MMFF) is designed to mine the different contributions of multi-modal features (i.e., all-in-focus image features and focal stack features). Then, a multi-level feature fusion module (MLFF) is developed to iteratively select and fuse complementary cues from multi-level features. For the latter, we integrate the Transformer for the first time and propose a transformer-based feature enhancement module (TFE), to provide a wider receptive field for each pixel of high-level features. To validate our idea, we comprehensively evaluate the performance of our TENet on three challenging datasets. Experimental results show that our method outperforms the state-of-the-art method, e.g., the detection accuracy is improved by 28.1%, 20.3%, and 14.9% in MAE metric, respectively.
ISSN:0262-8856
1872-8138
DOI:10.1016/j.imavis.2022.104595