Highly Efficient RGB-D Salient Object Detection with Adaptive Fusion and Attention Regulation
Existing RGB-D salient object detection (SOD) models have large numbers of parameters, high computational complexity, and slow inference speeds, limiting their deployment on edge devices. To address this issue, we propose a highly efficient network (HENet), focusing on developing lightweight RGB-D S...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on circuits and systems for video technology 2024-11, p.1-1 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 1 |
---|---|
container_issue | |
container_start_page | 1 |
container_title | IEEE transactions on circuits and systems for video technology |
container_volume | |
creator | Gao, Haoran Wang, Fasheng Wang, Mengyin Sun, Fuming Li, Haojie |
description | Existing RGB-D salient object detection (SOD) models have large numbers of parameters, high computational complexity, and slow inference speeds, limiting their deployment on edge devices. To address this issue, we propose a highly efficient network (HENet), focusing on developing lightweight RGB-D SOD models. Specifically, to fairly handle multimodal inputs and capture long-range dependencies of features, we employ a dual-stream structure and use MobileViT as the network encoder. We introduce the Adaptive Edge-Aware Fusion Module (AEFM) that adaptively adjusts the contribution of features during the fusion process based on the amount of feature information, and perceives the edges of the fused features at the pixel level. To compensate for the insufficient feature extraction capability of the lightweight backbone network, we propose the Dual-Branch Feature Enhancement Module (DFEM) to enhance the representation capability of the fused features. Finally, we design the Feature Attention Regulation Module (FARM) to adjust the model's focus in real time. HENet has fewer parameters (11.9M) and lower computational complexity (10.7 GFLOPs), achieving an inference speed of 121 FPS for images with size 384×384. Extensive experiments are conducted on seven challenging RGB-D SOD datasets. The experimental results demonstrate that HENet outperforms 16 state-of-the-art methods and shows great potential in downstream computer vision tasks. Codes and results are available on https://github.com/BojueGao/HENet. |
doi_str_mv | 10.1109/TCSVT.2024.3502244 |
format | Article |
fullrecord | <record><control><sourceid>crossref_RIE</sourceid><recordid>TN_cdi_ieee_primary_10758288</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10758288</ieee_id><sourcerecordid>10_1109_TCSVT_2024_3502244</sourcerecordid><originalsourceid>FETCH-LOGICAL-c648-8cee46f509788ec909086533b37438bb784cd3fc4c48fb2e4da9ff81a45a7093</originalsourceid><addsrcrecordid>eNpNkM1qwkAUhYfSQq3tC5Qu5gVi59fcLK2_BUEw0l0Jk8kdHUlVkrHi2zdRF12dcy58d_ER8spZj3OWvK-G6deqJ5hQPamZEErdkQ7XGiIhmL5vOtM8AsH1I3mq6y1jXIGKO-R75teb8kzHznnrcRfocvoRjWhqysta5Fu0gY4wNOH3O3ryYUMHhTkE_4t0cqzbo9kVdBBCA7Rrietjadr6TB6cKWt8uWWXpJPxajiL5ovp53Awj2xfQQQWUfWdZkkMgDZhCYO-ljKXsZKQ5zEoW0hnlVXgcoGqMIlzwI3SJmaJ7BJx_WqrfV1X6LJD5X9Mdc44y1o92UVP1urJbnoa6O0KeUT8B8QaBID8A1KPYeU</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Highly Efficient RGB-D Salient Object Detection with Adaptive Fusion and Attention Regulation</title><source>IEEE Electronic Library (IEL)</source><creator>Gao, Haoran ; Wang, Fasheng ; Wang, Mengyin ; Sun, Fuming ; Li, Haojie</creator><creatorcontrib>Gao, Haoran ; Wang, Fasheng ; Wang, Mengyin ; Sun, Fuming ; Li, Haojie</creatorcontrib><description>Existing RGB-D salient object detection (SOD) models have large numbers of parameters, high computational complexity, and slow inference speeds, limiting their deployment on edge devices. To address this issue, we propose a highly efficient network (HENet), focusing on developing lightweight RGB-D SOD models. Specifically, to fairly handle multimodal inputs and capture long-range dependencies of features, we employ a dual-stream structure and use MobileViT as the network encoder. We introduce the Adaptive Edge-Aware Fusion Module (AEFM) that adaptively adjusts the contribution of features during the fusion process based on the amount of feature information, and perceives the edges of the fused features at the pixel level. To compensate for the insufficient feature extraction capability of the lightweight backbone network, we propose the Dual-Branch Feature Enhancement Module (DFEM) to enhance the representation capability of the fused features. Finally, we design the Feature Attention Regulation Module (FARM) to adjust the model's focus in real time. HENet has fewer parameters (11.9M) and lower computational complexity (10.7 GFLOPs), achieving an inference speed of 121 FPS for images with size 384×384. Extensive experiments are conducted on seven challenging RGB-D SOD datasets. The experimental results demonstrate that HENet outperforms 16 state-of-the-art methods and shows great potential in downstream computer vision tasks. Codes and results are available on https://github.com/BojueGao/HENet.</description><identifier>ISSN: 1051-8215</identifier><identifier>EISSN: 1558-2205</identifier><identifier>DOI: 10.1109/TCSVT.2024.3502244</identifier><identifier>CODEN: ITCTEM</identifier><language>eng</language><publisher>IEEE</publisher><subject>Accuracy ; Adaptation models ; adaptive fusion ; Computational complexity ; Computational modeling ; Decoding ; edge-aware ; Feature extraction ; Image edge detection ; lightweight ; Object detection ; salient object detection ; Semantics ; Transformers</subject><ispartof>IEEE transactions on circuits and systems for video technology, 2024-11, p.1-1</ispartof><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><orcidid>0000-0003-3932-2712 ; 0000-0002-0946-0789 ; 0009-0001-1985-7026 ; 0000-0003-3882-2205</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10758288$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10758288$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Gao, Haoran</creatorcontrib><creatorcontrib>Wang, Fasheng</creatorcontrib><creatorcontrib>Wang, Mengyin</creatorcontrib><creatorcontrib>Sun, Fuming</creatorcontrib><creatorcontrib>Li, Haojie</creatorcontrib><title>Highly Efficient RGB-D Salient Object Detection with Adaptive Fusion and Attention Regulation</title><title>IEEE transactions on circuits and systems for video technology</title><addtitle>TCSVT</addtitle><description>Existing RGB-D salient object detection (SOD) models have large numbers of parameters, high computational complexity, and slow inference speeds, limiting their deployment on edge devices. To address this issue, we propose a highly efficient network (HENet), focusing on developing lightweight RGB-D SOD models. Specifically, to fairly handle multimodal inputs and capture long-range dependencies of features, we employ a dual-stream structure and use MobileViT as the network encoder. We introduce the Adaptive Edge-Aware Fusion Module (AEFM) that adaptively adjusts the contribution of features during the fusion process based on the amount of feature information, and perceives the edges of the fused features at the pixel level. To compensate for the insufficient feature extraction capability of the lightweight backbone network, we propose the Dual-Branch Feature Enhancement Module (DFEM) to enhance the representation capability of the fused features. Finally, we design the Feature Attention Regulation Module (FARM) to adjust the model's focus in real time. HENet has fewer parameters (11.9M) and lower computational complexity (10.7 GFLOPs), achieving an inference speed of 121 FPS for images with size 384×384. Extensive experiments are conducted on seven challenging RGB-D SOD datasets. The experimental results demonstrate that HENet outperforms 16 state-of-the-art methods and shows great potential in downstream computer vision tasks. Codes and results are available on https://github.com/BojueGao/HENet.</description><subject>Accuracy</subject><subject>Adaptation models</subject><subject>adaptive fusion</subject><subject>Computational complexity</subject><subject>Computational modeling</subject><subject>Decoding</subject><subject>edge-aware</subject><subject>Feature extraction</subject><subject>Image edge detection</subject><subject>lightweight</subject><subject>Object detection</subject><subject>salient object detection</subject><subject>Semantics</subject><subject>Transformers</subject><issn>1051-8215</issn><issn>1558-2205</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNkM1qwkAUhYfSQq3tC5Qu5gVi59fcLK2_BUEw0l0Jk8kdHUlVkrHi2zdRF12dcy58d_ER8spZj3OWvK-G6deqJ5hQPamZEErdkQ7XGiIhmL5vOtM8AsH1I3mq6y1jXIGKO-R75teb8kzHznnrcRfocvoRjWhqysta5Fu0gY4wNOH3O3ryYUMHhTkE_4t0cqzbo9kVdBBCA7Rrietjadr6TB6cKWt8uWWXpJPxajiL5ovp53Awj2xfQQQWUfWdZkkMgDZhCYO-ljKXsZKQ5zEoW0hnlVXgcoGqMIlzwI3SJmaJ7BJx_WqrfV1X6LJD5X9Mdc44y1o92UVP1urJbnoa6O0KeUT8B8QaBID8A1KPYeU</recordid><startdate>20241118</startdate><enddate>20241118</enddate><creator>Gao, Haoran</creator><creator>Wang, Fasheng</creator><creator>Wang, Mengyin</creator><creator>Sun, Fuming</creator><creator>Li, Haojie</creator><general>IEEE</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0003-3932-2712</orcidid><orcidid>https://orcid.org/0000-0002-0946-0789</orcidid><orcidid>https://orcid.org/0009-0001-1985-7026</orcidid><orcidid>https://orcid.org/0000-0003-3882-2205</orcidid></search><sort><creationdate>20241118</creationdate><title>Highly Efficient RGB-D Salient Object Detection with Adaptive Fusion and Attention Regulation</title><author>Gao, Haoran ; Wang, Fasheng ; Wang, Mengyin ; Sun, Fuming ; Li, Haojie</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c648-8cee46f509788ec909086533b37438bb784cd3fc4c48fb2e4da9ff81a45a7093</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Accuracy</topic><topic>Adaptation models</topic><topic>adaptive fusion</topic><topic>Computational complexity</topic><topic>Computational modeling</topic><topic>Decoding</topic><topic>edge-aware</topic><topic>Feature extraction</topic><topic>Image edge detection</topic><topic>lightweight</topic><topic>Object detection</topic><topic>salient object detection</topic><topic>Semantics</topic><topic>Transformers</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Gao, Haoran</creatorcontrib><creatorcontrib>Wang, Fasheng</creatorcontrib><creatorcontrib>Wang, Mengyin</creatorcontrib><creatorcontrib>Sun, Fuming</creatorcontrib><creatorcontrib>Li, Haojie</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><jtitle>IEEE transactions on circuits and systems for video technology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Gao, Haoran</au><au>Wang, Fasheng</au><au>Wang, Mengyin</au><au>Sun, Fuming</au><au>Li, Haojie</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Highly Efficient RGB-D Salient Object Detection with Adaptive Fusion and Attention Regulation</atitle><jtitle>IEEE transactions on circuits and systems for video technology</jtitle><stitle>TCSVT</stitle><date>2024-11-18</date><risdate>2024</risdate><spage>1</spage><epage>1</epage><pages>1-1</pages><issn>1051-8215</issn><eissn>1558-2205</eissn><coden>ITCTEM</coden><abstract>Existing RGB-D salient object detection (SOD) models have large numbers of parameters, high computational complexity, and slow inference speeds, limiting their deployment on edge devices. To address this issue, we propose a highly efficient network (HENet), focusing on developing lightweight RGB-D SOD models. Specifically, to fairly handle multimodal inputs and capture long-range dependencies of features, we employ a dual-stream structure and use MobileViT as the network encoder. We introduce the Adaptive Edge-Aware Fusion Module (AEFM) that adaptively adjusts the contribution of features during the fusion process based on the amount of feature information, and perceives the edges of the fused features at the pixel level. To compensate for the insufficient feature extraction capability of the lightweight backbone network, we propose the Dual-Branch Feature Enhancement Module (DFEM) to enhance the representation capability of the fused features. Finally, we design the Feature Attention Regulation Module (FARM) to adjust the model's focus in real time. HENet has fewer parameters (11.9M) and lower computational complexity (10.7 GFLOPs), achieving an inference speed of 121 FPS for images with size 384×384. Extensive experiments are conducted on seven challenging RGB-D SOD datasets. The experimental results demonstrate that HENet outperforms 16 state-of-the-art methods and shows great potential in downstream computer vision tasks. Codes and results are available on https://github.com/BojueGao/HENet.</abstract><pub>IEEE</pub><doi>10.1109/TCSVT.2024.3502244</doi><tpages>1</tpages><orcidid>https://orcid.org/0000-0003-3932-2712</orcidid><orcidid>https://orcid.org/0000-0002-0946-0789</orcidid><orcidid>https://orcid.org/0009-0001-1985-7026</orcidid><orcidid>https://orcid.org/0000-0003-3882-2205</orcidid></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 1051-8215 |
ispartof | IEEE transactions on circuits and systems for video technology, 2024-11, p.1-1 |
issn | 1051-8215 1558-2205 |
language | eng |
recordid | cdi_ieee_primary_10758288 |
source | IEEE Electronic Library (IEL) |
subjects | Accuracy Adaptation models adaptive fusion Computational complexity Computational modeling Decoding edge-aware Feature extraction Image edge detection lightweight Object detection salient object detection Semantics Transformers |
title | Highly Efficient RGB-D Salient Object Detection with Adaptive Fusion and Attention Regulation |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-05T01%3A27%3A51IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Highly%20Efficient%20RGB-D%20Salient%20Object%20Detection%20with%20Adaptive%20Fusion%20and%20Attention%20Regulation&rft.jtitle=IEEE%20transactions%20on%20circuits%20and%20systems%20for%20video%20technology&rft.au=Gao,%20Haoran&rft.date=2024-11-18&rft.spage=1&rft.epage=1&rft.pages=1-1&rft.issn=1051-8215&rft.eissn=1558-2205&rft.coden=ITCTEM&rft_id=info:doi/10.1109/TCSVT.2024.3502244&rft_dat=%3Ccrossref_RIE%3E10_1109_TCSVT_2024_3502244%3C/crossref_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=10758288&rfr_iscdi=true |