Highly Efficient RGB-D Salient Object Detection with Adaptive Fusion and Attention Regulation

Existing RGB-D salient object detection (SOD) models have large numbers of parameters, high computational complexity, and slow inference speeds, limiting their deployment on edge devices. To address this issue, we propose a highly efficient network (HENet), focusing on developing lightweight RGB-D S...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on circuits and systems for video technology 2024-11, p.1-1
Hauptverfasser:	Gao, Haoran, Wang, Fasheng, Wang, Mengyin, Sun, Fuming, Li, Haojie
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Adaptation models adaptive fusion Computational complexity Computational modeling Decoding edge-aware Feature extraction Image edge detection lightweight Object detection salient object detection Semantics Transformers
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	1
container_issue
container_start_page	1
container_title	IEEE transactions on circuits and systems for video technology
container_volume
creator	Gao, Haoran Wang, Fasheng Wang, Mengyin Sun, Fuming Li, Haojie
description	Existing RGB-D salient object detection (SOD) models have large numbers of parameters, high computational complexity, and slow inference speeds, limiting their deployment on edge devices. To address this issue, we propose a highly efficient network (HENet), focusing on developing lightweight RGB-D SOD models. Specifically, to fairly handle multimodal inputs and capture long-range dependencies of features, we employ a dual-stream structure and use MobileViT as the network encoder. We introduce the Adaptive Edge-Aware Fusion Module (AEFM) that adaptively adjusts the contribution of features during the fusion process based on the amount of feature information, and perceives the edges of the fused features at the pixel level. To compensate for the insufficient feature extraction capability of the lightweight backbone network, we propose the Dual-Branch Feature Enhancement Module (DFEM) to enhance the representation capability of the fused features. Finally, we design the Feature Attention Regulation Module (FARM) to adjust the model's focus in real time. HENet has fewer parameters (11.9M) and lower computational complexity (10.7 GFLOPs), achieving an inference speed of 121 FPS for images with size 384×384. Extensive experiments are conducted on seven challenging RGB-D SOD datasets. The experimental results demonstrate that HENet outperforms 16 state-of-the-art methods and shows great potential in downstream computer vision tasks. Codes and results are available on https://github.com/BojueGao/HENet.
doi_str_mv	10.1109/TCSVT.2024.3502244
format	Article
fullrecord	<record><control><sourceid>crossref_RIE</sourceid><recordid>TN_cdi_ieee_primary_10758288</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10758288</ieee_id><sourcerecordid>10_1109_TCSVT_2024_3502244</sourcerecordid><originalsourceid>FETCH-LOGICAL-c648-8cee46f509788ec909086533b37438bb784cd3fc4c48fb2e4da9ff81a45a7093</originalsourceid><addsrcrecordid>eNpNkM1qwkAUhYfSQq3tC5Qu5gVi59fcLK2_BUEw0l0Jk8kdHUlVkrHi2zdRF12dcy58d_ER8spZj3OWvK-G6deqJ5hQPamZEErdkQ7XGiIhmL5vOtM8AsH1I3mq6y1jXIGKO-R75teb8kzHznnrcRfocvoRjWhqysta5Fu0gY4wNOH3O3ryYUMHhTkE_4t0cqzbo9kVdBBCA7Rrietjadr6TB6cKWt8uWWXpJPxajiL5ovp53Awj2xfQQQWUfWdZkkMgDZhCYO-ljKXsZKQ5zEoW0hnlVXgcoGqMIlzwI3SJmaJ7BJx_WqrfV1X6LJD5X9Mdc44y1o92UVP1urJbnoa6O0KeUT8B8QaBID8A1KPYeU</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Highly Efficient RGB-D Salient Object Detection with Adaptive Fusion and Attention Regulation</title><source>IEEE Electronic Library (IEL)</source><creator>Gao, Haoran ; Wang, Fasheng ; Wang, Mengyin ; Sun, Fuming ; Li, Haojie</creator><creatorcontrib>Gao, Haoran ; Wang, Fasheng ; Wang, Mengyin ; Sun, Fuming ; Li, Haojie</creatorcontrib><description>Existing RGB-D salient object detection (SOD) models have large numbers of parameters, high computational complexity, and slow inference speeds, limiting their deployment on edge devices. To address this issue, we propose a highly efficient network (HENet), focusing on developing lightweight RGB-D SOD models. Specifically, to fairly handle multimodal inputs and capture long-range dependencies of features, we employ a dual-stream structure and use MobileViT as the network encoder. We introduce the Adaptive Edge-Aware Fusion Module (AEFM) that adaptively adjusts the contribution of features during the fusion process based on the amount of feature information, and perceives the edges of the fused features at the pixel level. To compensate for the insufficient feature extraction capability of the lightweight backbone network, we propose the Dual-Branch Feature Enhancement Module (DFEM) to enhance the representation capability of the fused features. Finally, we design the Feature Attention Regulation Module (FARM) to adjust the model's focus in real time. HENet has fewer parameters (11.9M) and lower computational complexity (10.7 GFLOPs), achieving an inference speed of 121 FPS for images with size 384×384. Extensive experiments are conducted on seven challenging RGB-D SOD datasets. The experimental results demonstrate that HENet outperforms 16 state-of-the-art methods and shows great potential in downstream computer vision tasks. Codes and results are available on https://github.com/BojueGao/HENet.</description><identifier>ISSN: 1051-8215</identifier><identifier>EISSN: 1558-2205</identifier><identifier>DOI: 10.1109/TCSVT.2024.3502244</identifier><identifier>CODEN: ITCTEM</identifier><language>eng</language><publisher>IEEE</publisher><subject>Accuracy ; Adaptation models ; adaptive fusion ; Computational complexity ; Computational modeling ; Decoding ; edge-aware ; Feature extraction ; Image edge detection ; lightweight ; Object detection ; salient object detection ; Semantics ; Transformers</subject><ispartof>IEEE transactions on circuits and systems for video technology, 2024-11, p.1-1</ispartof><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><orcidid>0000-0003-3932-2712 ; 0000-0002-0946-0789 ; 0009-0001-1985-7026 ; 0000-0003-3882-2205</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10758288$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10758288$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Gao, Haoran</creatorcontrib><creatorcontrib>Wang, Fasheng</creatorcontrib><creatorcontrib>Wang, Mengyin</creatorcontrib><creatorcontrib>Sun, Fuming</creatorcontrib><creatorcontrib>Li, Haojie</creatorcontrib><title>Highly Efficient RGB-D Salient Object Detection with Adaptive Fusion and Attention Regulation</title><title>IEEE transactions on circuits and systems for video technology</title><addtitle>TCSVT</addtitle><description>Existing RGB-D salient object detection (SOD) models have large numbers of parameters, high computational complexity, and slow inference speeds, limiting their deployment on edge devices. To address this issue, we propose a highly efficient network (HENet), focusing on developing lightweight RGB-D SOD models. Specifically, to fairly handle multimodal inputs and capture long-range dependencies of features, we employ a dual-stream structure and use MobileViT as the network encoder. We introduce the Adaptive Edge-Aware Fusion Module (AEFM) that adaptively adjusts the contribution of features during the fusion process based on the amount of feature information, and perceives the edges of the fused features at the pixel level. To compensate for the insufficient feature extraction capability of the lightweight backbone network, we propose the Dual-Branch Feature Enhancement Module (DFEM) to enhance the representation capability of the fused features. Finally, we design the Feature Attention Regulation Module (FARM) to adjust the model's focus in real time. HENet has fewer parameters (11.9M) and lower computational complexity (10.7 GFLOPs), achieving an inference speed of 121 FPS for images with size 384×384. Extensive experiments are conducted on seven challenging RGB-D SOD datasets. The experimental results demonstrate that HENet outperforms 16 state-of-the-art methods and shows great potential in downstream computer vision tasks. Codes and results are available on https://github.com/BojueGao/HENet.</description><subject>Accuracy</subject><subject>Adaptation models</subject><subject>adaptive fusion</subject><subject>Computational complexity</subject><subject>Computational modeling</subject><subject>Decoding</subject><subject>edge-aware</subject><subject>Feature extraction</subject><subject>Image edge detection</subject><subject>lightweight</subject><subject>Object detection</subject><subject>salient object detection</subject><subject>Semantics</subject><subject>Transformers</subject><issn>1051-8215</issn><issn>1558-2205</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNkM1qwkAUhYfSQq3tC5Qu5gVi59fcLK2_BUEw0l0Jk8kdHUlVkrHi2zdRF12dcy58d_ER8spZj3OWvK-G6deqJ5hQPamZEErdkQ7XGiIhmL5vOtM8AsH1I3mq6y1jXIGKO-R75teb8kzHznnrcRfocvoRjWhqysta5Fu0gY4wNOH3O3ryYUMHhTkE_4t0cqzbo9kVdBBCA7Rrietjadr6TB6cKWt8uWWXpJPxajiL5ovp53Awj2xfQQQWUfWdZkkMgDZhCYO-ljKXsZKQ5zEoW0hnlVXgcoGqMIlzwI3SJmaJ7BJx_WqrfV1X6LJD5X9Mdc44y1o92UVP1urJbnoa6O0KeUT8B8QaBID8A1KPYeU</recordid><startdate>20241118</startdate><enddate>20241118</enddate><creator>Gao, Haoran</creator><creator>Wang, Fasheng</creator><creator>Wang, Mengyin</creator><creator>Sun, Fuming</creator><creator>Li, Haojie</creator><general>IEEE</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0003-3932-2712</orcidid><orcidid>https://orcid.org/0000-0002-0946-0789</orcidid><orcidid>https://orcid.org/0009-0001-1985-7026</orcidid><orcidid>https://orcid.org/0000-0003-3882-2205</orcidid></search><sort><creationdate>20241118</creationdate><title>Highly Efficient RGB-D Salient Object Detection with Adaptive Fusion and Attention Regulation</title><author>Gao, Haoran ; Wang, Fasheng ; Wang, Mengyin ; Sun, Fuming ; Li, Haojie</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c648-8cee46f509788ec909086533b37438bb784cd3fc4c48fb2e4da9ff81a45a7093</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Accuracy</topic><topic>Adaptation models</topic><topic>adaptive fusion</topic><topic>Computational complexity</topic><topic>Computational modeling</topic><topic>Decoding</topic><topic>edge-aware</topic><topic>Feature extraction</topic><topic>Image edge detection</topic><topic>lightweight</topic><topic>Object detection</topic><topic>salient object detection</topic><topic>Semantics</topic><topic>Transformers</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Gao, Haoran</creatorcontrib><creatorcontrib>Wang, Fasheng</creatorcontrib><creatorcontrib>Wang, Mengyin</creatorcontrib><creatorcontrib>Sun, Fuming</creatorcontrib><creatorcontrib>Li, Haojie</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><jtitle>IEEE transactions on circuits and systems for video technology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Gao, Haoran</au><au>Wang, Fasheng</au><au>Wang, Mengyin</au><au>Sun, Fuming</au><au>Li, Haojie</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Highly Efficient RGB-D Salient Object Detection with Adaptive Fusion and Attention Regulation</atitle><jtitle>IEEE transactions on circuits and systems for video technology</jtitle><stitle>TCSVT</stitle><date>2024-11-18</date><risdate>2024</risdate><spage>1</spage><epage>1</epage><pages>1-1</pages><issn>1051-8215</issn><eissn>1558-2205</eissn><coden>ITCTEM</coden><abstract>Existing RGB-D salient object detection (SOD) models have large numbers of parameters, high computational complexity, and slow inference speeds, limiting their deployment on edge devices. To address this issue, we propose a highly efficient network (HENet), focusing on developing lightweight RGB-D SOD models. Specifically, to fairly handle multimodal inputs and capture long-range dependencies of features, we employ a dual-stream structure and use MobileViT as the network encoder. We introduce the Adaptive Edge-Aware Fusion Module (AEFM) that adaptively adjusts the contribution of features during the fusion process based on the amount of feature information, and perceives the edges of the fused features at the pixel level. To compensate for the insufficient feature extraction capability of the lightweight backbone network, we propose the Dual-Branch Feature Enhancement Module (DFEM) to enhance the representation capability of the fused features. Finally, we design the Feature Attention Regulation Module (FARM) to adjust the model's focus in real time. HENet has fewer parameters (11.9M) and lower computational complexity (10.7 GFLOPs), achieving an inference speed of 121 FPS for images with size 384×384. Extensive experiments are conducted on seven challenging RGB-D SOD datasets. The experimental results demonstrate that HENet outperforms 16 state-of-the-art methods and shows great potential in downstream computer vision tasks. Codes and results are available on https://github.com/BojueGao/HENet.</abstract><pub>IEEE</pub><doi>10.1109/TCSVT.2024.3502244</doi><tpages>1</tpages><orcidid>https://orcid.org/0000-0003-3932-2712</orcidid><orcidid>https://orcid.org/0000-0002-0946-0789</orcidid><orcidid>https://orcid.org/0009-0001-1985-7026</orcidid><orcidid>https://orcid.org/0000-0003-3882-2205</orcidid></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 1051-8215
ispartof	IEEE transactions on circuits and systems for video technology, 2024-11, p.1-1
issn	1051-8215 1558-2205
language	eng
recordid	cdi_ieee_primary_10758288
source	IEEE Electronic Library (IEL)
subjects	Accuracy Adaptation models adaptive fusion Computational complexity Computational modeling Decoding edge-aware Feature extraction Image edge detection lightweight Object detection salient object detection Semantics Transformers
title	Highly Efficient RGB-D Salient Object Detection with Adaptive Fusion and Attention Regulation
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-05T01%3A27%3A51IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Highly%20Efficient%20RGB-D%20Salient%20Object%20Detection%20with%20Adaptive%20Fusion%20and%20Attention%20Regulation&rft.jtitle=IEEE%20transactions%20on%20circuits%20and%20systems%20for%20video%20technology&rft.au=Gao,%20Haoran&rft.date=2024-11-18&rft.spage=1&rft.epage=1&rft.pages=1-1&rft.issn=1051-8215&rft.eissn=1558-2205&rft.coden=ITCTEM&rft_id=info:doi/10.1109/TCSVT.2024.3502244&rft_dat=%3Ccrossref_RIE%3E10_1109_TCSVT_2024_3502244%3C/crossref_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=10758288&rfr_iscdi=true