Highly Efficient RGB-D Salient Object Detection with Adaptive Fusion and Attention Regulation

Existing RGB-D salient object detection (SOD) models have large numbers of parameters, high computational complexity, and slow inference speeds, limiting their deployment on edge devices. To address this issue, we propose a highly efficient network (HENet), focusing on developing lightweight RGB-D S...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on circuits and systems for video technology 2024-11, p.1-1
Hauptverfasser: Gao, Haoran, Wang, Fasheng, Wang, Mengyin, Sun, Fuming, Li, Haojie
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1
container_issue
container_start_page 1
container_title IEEE transactions on circuits and systems for video technology
container_volume
creator Gao, Haoran
Wang, Fasheng
Wang, Mengyin
Sun, Fuming
Li, Haojie
description Existing RGB-D salient object detection (SOD) models have large numbers of parameters, high computational complexity, and slow inference speeds, limiting their deployment on edge devices. To address this issue, we propose a highly efficient network (HENet), focusing on developing lightweight RGB-D SOD models. Specifically, to fairly handle multimodal inputs and capture long-range dependencies of features, we employ a dual-stream structure and use MobileViT as the network encoder. We introduce the Adaptive Edge-Aware Fusion Module (AEFM) that adaptively adjusts the contribution of features during the fusion process based on the amount of feature information, and perceives the edges of the fused features at the pixel level. To compensate for the insufficient feature extraction capability of the lightweight backbone network, we propose the Dual-Branch Feature Enhancement Module (DFEM) to enhance the representation capability of the fused features. Finally, we design the Feature Attention Regulation Module (FARM) to adjust the model's focus in real time. HENet has fewer parameters (11.9M) and lower computational complexity (10.7 GFLOPs), achieving an inference speed of 121 FPS for images with size 384×384. Extensive experiments are conducted on seven challenging RGB-D SOD datasets. The experimental results demonstrate that HENet outperforms 16 state-of-the-art methods and shows great potential in downstream computer vision tasks. Codes and results are available on https://github.com/BojueGao/HENet.
doi_str_mv 10.1109/TCSVT.2024.3502244
format Article
fullrecord <record><control><sourceid>crossref_RIE</sourceid><recordid>TN_cdi_ieee_primary_10758288</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10758288</ieee_id><sourcerecordid>10_1109_TCSVT_2024_3502244</sourcerecordid><originalsourceid>FETCH-LOGICAL-c648-8cee46f509788ec909086533b37438bb784cd3fc4c48fb2e4da9ff81a45a7093</originalsourceid><addsrcrecordid>eNpNkM1qwkAUhYfSQq3tC5Qu5gVi59fcLK2_BUEw0l0Jk8kdHUlVkrHi2zdRF12dcy58d_ER8spZj3OWvK-G6deqJ5hQPamZEErdkQ7XGiIhmL5vOtM8AsH1I3mq6y1jXIGKO-R75teb8kzHznnrcRfocvoRjWhqysta5Fu0gY4wNOH3O3ryYUMHhTkE_4t0cqzbo9kVdBBCA7Rrietjadr6TB6cKWt8uWWXpJPxajiL5ovp53Awj2xfQQQWUfWdZkkMgDZhCYO-ljKXsZKQ5zEoW0hnlVXgcoGqMIlzwI3SJmaJ7BJx_WqrfV1X6LJD5X9Mdc44y1o92UVP1urJbnoa6O0KeUT8B8QaBID8A1KPYeU</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Highly Efficient RGB-D Salient Object Detection with Adaptive Fusion and Attention Regulation</title><source>IEEE Electronic Library (IEL)</source><creator>Gao, Haoran ; Wang, Fasheng ; Wang, Mengyin ; Sun, Fuming ; Li, Haojie</creator><creatorcontrib>Gao, Haoran ; Wang, Fasheng ; Wang, Mengyin ; Sun, Fuming ; Li, Haojie</creatorcontrib><description>Existing RGB-D salient object detection (SOD) models have large numbers of parameters, high computational complexity, and slow inference speeds, limiting their deployment on edge devices. To address this issue, we propose a highly efficient network (HENet), focusing on developing lightweight RGB-D SOD models. Specifically, to fairly handle multimodal inputs and capture long-range dependencies of features, we employ a dual-stream structure and use MobileViT as the network encoder. We introduce the Adaptive Edge-Aware Fusion Module (AEFM) that adaptively adjusts the contribution of features during the fusion process based on the amount of feature information, and perceives the edges of the fused features at the pixel level. To compensate for the insufficient feature extraction capability of the lightweight backbone network, we propose the Dual-Branch Feature Enhancement Module (DFEM) to enhance the representation capability of the fused features. Finally, we design the Feature Attention Regulation Module (FARM) to adjust the model's focus in real time. HENet has fewer parameters (11.9M) and lower computational complexity (10.7 GFLOPs), achieving an inference speed of 121 FPS for images with size 384×384. Extensive experiments are conducted on seven challenging RGB-D SOD datasets. The experimental results demonstrate that HENet outperforms 16 state-of-the-art methods and shows great potential in downstream computer vision tasks. Codes and results are available on https://github.com/BojueGao/HENet.</description><identifier>ISSN: 1051-8215</identifier><identifier>EISSN: 1558-2205</identifier><identifier>DOI: 10.1109/TCSVT.2024.3502244</identifier><identifier>CODEN: ITCTEM</identifier><language>eng</language><publisher>IEEE</publisher><subject>Accuracy ; Adaptation models ; adaptive fusion ; Computational complexity ; Computational modeling ; Decoding ; edge-aware ; Feature extraction ; Image edge detection ; lightweight ; Object detection ; salient object detection ; Semantics ; Transformers</subject><ispartof>IEEE transactions on circuits and systems for video technology, 2024-11, p.1-1</ispartof><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><orcidid>0000-0003-3932-2712 ; 0000-0002-0946-0789 ; 0009-0001-1985-7026 ; 0000-0003-3882-2205</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10758288$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10758288$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Gao, Haoran</creatorcontrib><creatorcontrib>Wang, Fasheng</creatorcontrib><creatorcontrib>Wang, Mengyin</creatorcontrib><creatorcontrib>Sun, Fuming</creatorcontrib><creatorcontrib>Li, Haojie</creatorcontrib><title>Highly Efficient RGB-D Salient Object Detection with Adaptive Fusion and Attention Regulation</title><title>IEEE transactions on circuits and systems for video technology</title><addtitle>TCSVT</addtitle><description>Existing RGB-D salient object detection (SOD) models have large numbers of parameters, high computational complexity, and slow inference speeds, limiting their deployment on edge devices. To address this issue, we propose a highly efficient network (HENet), focusing on developing lightweight RGB-D SOD models. Specifically, to fairly handle multimodal inputs and capture long-range dependencies of features, we employ a dual-stream structure and use MobileViT as the network encoder. We introduce the Adaptive Edge-Aware Fusion Module (AEFM) that adaptively adjusts the contribution of features during the fusion process based on the amount of feature information, and perceives the edges of the fused features at the pixel level. To compensate for the insufficient feature extraction capability of the lightweight backbone network, we propose the Dual-Branch Feature Enhancement Module (DFEM) to enhance the representation capability of the fused features. Finally, we design the Feature Attention Regulation Module (FARM) to adjust the model's focus in real time. HENet has fewer parameters (11.9M) and lower computational complexity (10.7 GFLOPs), achieving an inference speed of 121 FPS for images with size 384×384. Extensive experiments are conducted on seven challenging RGB-D SOD datasets. The experimental results demonstrate that HENet outperforms 16 state-of-the-art methods and shows great potential in downstream computer vision tasks. Codes and results are available on https://github.com/BojueGao/HENet.</description><subject>Accuracy</subject><subject>Adaptation models</subject><subject>adaptive fusion</subject><subject>Computational complexity</subject><subject>Computational modeling</subject><subject>Decoding</subject><subject>edge-aware</subject><subject>Feature extraction</subject><subject>Image edge detection</subject><subject>lightweight</subject><subject>Object detection</subject><subject>salient object detection</subject><subject>Semantics</subject><subject>Transformers</subject><issn>1051-8215</issn><issn>1558-2205</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNkM1qwkAUhYfSQq3tC5Qu5gVi59fcLK2_BUEw0l0Jk8kdHUlVkrHi2zdRF12dcy58d_ER8spZj3OWvK-G6deqJ5hQPamZEErdkQ7XGiIhmL5vOtM8AsH1I3mq6y1jXIGKO-R75teb8kzHznnrcRfocvoRjWhqysta5Fu0gY4wNOH3O3ryYUMHhTkE_4t0cqzbo9kVdBBCA7Rrietjadr6TB6cKWt8uWWXpJPxajiL5ovp53Awj2xfQQQWUfWdZkkMgDZhCYO-ljKXsZKQ5zEoW0hnlVXgcoGqMIlzwI3SJmaJ7BJx_WqrfV1X6LJD5X9Mdc44y1o92UVP1urJbnoa6O0KeUT8B8QaBID8A1KPYeU</recordid><startdate>20241118</startdate><enddate>20241118</enddate><creator>Gao, Haoran</creator><creator>Wang, Fasheng</creator><creator>Wang, Mengyin</creator><creator>Sun, Fuming</creator><creator>Li, Haojie</creator><general>IEEE</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0003-3932-2712</orcidid><orcidid>https://orcid.org/0000-0002-0946-0789</orcidid><orcidid>https://orcid.org/0009-0001-1985-7026</orcidid><orcidid>https://orcid.org/0000-0003-3882-2205</orcidid></search><sort><creationdate>20241118</creationdate><title>Highly Efficient RGB-D Salient Object Detection with Adaptive Fusion and Attention Regulation</title><author>Gao, Haoran ; Wang, Fasheng ; Wang, Mengyin ; Sun, Fuming ; Li, Haojie</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c648-8cee46f509788ec909086533b37438bb784cd3fc4c48fb2e4da9ff81a45a7093</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Accuracy</topic><topic>Adaptation models</topic><topic>adaptive fusion</topic><topic>Computational complexity</topic><topic>Computational modeling</topic><topic>Decoding</topic><topic>edge-aware</topic><topic>Feature extraction</topic><topic>Image edge detection</topic><topic>lightweight</topic><topic>Object detection</topic><topic>salient object detection</topic><topic>Semantics</topic><topic>Transformers</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Gao, Haoran</creatorcontrib><creatorcontrib>Wang, Fasheng</creatorcontrib><creatorcontrib>Wang, Mengyin</creatorcontrib><creatorcontrib>Sun, Fuming</creatorcontrib><creatorcontrib>Li, Haojie</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><jtitle>IEEE transactions on circuits and systems for video technology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Gao, Haoran</au><au>Wang, Fasheng</au><au>Wang, Mengyin</au><au>Sun, Fuming</au><au>Li, Haojie</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Highly Efficient RGB-D Salient Object Detection with Adaptive Fusion and Attention Regulation</atitle><jtitle>IEEE transactions on circuits and systems for video technology</jtitle><stitle>TCSVT</stitle><date>2024-11-18</date><risdate>2024</risdate><spage>1</spage><epage>1</epage><pages>1-1</pages><issn>1051-8215</issn><eissn>1558-2205</eissn><coden>ITCTEM</coden><abstract>Existing RGB-D salient object detection (SOD) models have large numbers of parameters, high computational complexity, and slow inference speeds, limiting their deployment on edge devices. To address this issue, we propose a highly efficient network (HENet), focusing on developing lightweight RGB-D SOD models. Specifically, to fairly handle multimodal inputs and capture long-range dependencies of features, we employ a dual-stream structure and use MobileViT as the network encoder. We introduce the Adaptive Edge-Aware Fusion Module (AEFM) that adaptively adjusts the contribution of features during the fusion process based on the amount of feature information, and perceives the edges of the fused features at the pixel level. To compensate for the insufficient feature extraction capability of the lightweight backbone network, we propose the Dual-Branch Feature Enhancement Module (DFEM) to enhance the representation capability of the fused features. Finally, we design the Feature Attention Regulation Module (FARM) to adjust the model's focus in real time. HENet has fewer parameters (11.9M) and lower computational complexity (10.7 GFLOPs), achieving an inference speed of 121 FPS for images with size 384×384. Extensive experiments are conducted on seven challenging RGB-D SOD datasets. The experimental results demonstrate that HENet outperforms 16 state-of-the-art methods and shows great potential in downstream computer vision tasks. Codes and results are available on https://github.com/BojueGao/HENet.</abstract><pub>IEEE</pub><doi>10.1109/TCSVT.2024.3502244</doi><tpages>1</tpages><orcidid>https://orcid.org/0000-0003-3932-2712</orcidid><orcidid>https://orcid.org/0000-0002-0946-0789</orcidid><orcidid>https://orcid.org/0009-0001-1985-7026</orcidid><orcidid>https://orcid.org/0000-0003-3882-2205</orcidid></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1051-8215
ispartof IEEE transactions on circuits and systems for video technology, 2024-11, p.1-1
issn 1051-8215
1558-2205
language eng
recordid cdi_ieee_primary_10758288
source IEEE Electronic Library (IEL)
subjects Accuracy
Adaptation models
adaptive fusion
Computational complexity
Computational modeling
Decoding
edge-aware
Feature extraction
Image edge detection
lightweight
Object detection
salient object detection
Semantics
Transformers
title Highly Efficient RGB-D Salient Object Detection with Adaptive Fusion and Attention Regulation
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-05T01%3A27%3A51IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Highly%20Efficient%20RGB-D%20Salient%20Object%20Detection%20with%20Adaptive%20Fusion%20and%20Attention%20Regulation&rft.jtitle=IEEE%20transactions%20on%20circuits%20and%20systems%20for%20video%20technology&rft.au=Gao,%20Haoran&rft.date=2024-11-18&rft.spage=1&rft.epage=1&rft.pages=1-1&rft.issn=1051-8215&rft.eissn=1558-2205&rft.coden=ITCTEM&rft_id=info:doi/10.1109/TCSVT.2024.3502244&rft_dat=%3Ccrossref_RIE%3E10_1109_TCSVT_2024_3502244%3C/crossref_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=10758288&rfr_iscdi=true