CTIF-Net: A CNN-Transformer Iterative Fusion Network for Salient Object Detection
Capturing sufficient global context and rich spatial structure information is critical for dense prediction tasks. Convolutional Neural Network (CNN) is particularly adept at modeling fine-grained local features, while Transformer excels at modeling global context information. It is evident that CNN...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on circuits and systems for video technology 2024-05, Vol.34 (5), p.3795-3805 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Capturing sufficient global context and rich spatial structure information is critical for dense prediction tasks. Convolutional Neural Network (CNN) is particularly adept at modeling fine-grained local features, while Transformer excels at modeling global context information. It is evident that CNN and Transformer exhibit complementary characteristics. Exploring the design of a network, that efficiently fuses these two models to leverage their strengths fully and achieve more accurate detection, represents a promising and worthwhile research topic. In this paper, we introduce a novel CNN-Transformer Iterative Fusion Network (CTIF-Net) for salient object detection. It efficiently combines CNN and Transformer to achieve superior performance by using a parallel dual encoder structure and a feature iterative fusion module. Firstly, CTIF-Net extracts features from the image using the CNN and the Transformer, respectively. Secondly, two feature convertors and a feature iterative fusion module are employed to combine and iteratively refine the two sets of features. The experimental results on multiple SOD datasets show that CTIF-Net outperforms 17 state-of-the-art methods, achieving higher performance in various mainstream evaluation metrics such as F-measure, S-measure, and MAE value. Code can be found at https://github.com/danielfaster/CTIF-Net/ . |
---|---|
ISSN: | 1051-8215 1558-2205 |
DOI: | 10.1109/TCSVT.2023.3321190 |