Multi-Scale and Detail-Enhanced Segment Anything Model for Salient Object Detection

Salient Object Detection (SOD) aims to identify and segment the most prominent objects in images. Advanced SOD methods often utilize various Convolutional Neural Networks (CNN) or Transformers for deep feature extraction. However, these methods still deliver low performance and poor generalization i...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Gao, Shixuan, Zhang, Pingping, Yan, Tianyu, Lu, Huchuan
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Gao, Shixuan
Zhang, Pingping
Yan, Tianyu
Lu, Huchuan
description Salient Object Detection (SOD) aims to identify and segment the most prominent objects in images. Advanced SOD methods often utilize various Convolutional Neural Networks (CNN) or Transformers for deep feature extraction. However, these methods still deliver low performance and poor generalization in complex cases. Recently, Segment Anything Model (SAM) has been proposed as a visual fundamental model, which gives strong segmentation and generalization capabilities. Nonetheless, SAM requires accurate prompts of target objects, which are unavailable in SOD. Additionally, SAM lacks the utilization of multi-scale and multi-level information, as well as the incorporation of fine-grained details. To address these shortcomings, we propose a Multi-scale and Detail-enhanced SAM (MDSAM) for SOD. Specifically, we first introduce a Lightweight Multi-Scale Adapter (LMSA), which allows SAM to learn multi-scale information with very few trainable parameters. Then, we propose a Multi-Level Fusion Module (MLFM) to comprehensively utilize the multi-level information from the SAM's encoder. Finally, we propose a Detail Enhancement Module (DEM) to incorporate SAM with fine-grained details. Experimental results demonstrate the superior performance of our model on multiple SOD datasets and its strong generalization on other segmentation tasks. The source code is released at https://github.com/BellyBeauty/MDSAM.
doi_str_mv 10.48550/arxiv.2408.04326
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2408_04326</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2408_04326</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2408_043263</originalsourceid><addsrcrecordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjGw0DMwMTYy42QI9i3NKcnUDU5OzElVSMxLUXBJLUnMzNF1zctIzEtOTVEITk3PTc0rUXDMqyzJyMxLV_DNT0nNUUjLL1IITszJBEn5J2WlJpeAdAKpzPw8HgbWtMSc4lReKM3NIO_mGuLsoQu2Pr6gKDM3sagyHuSMeLAzjAmrAABh0Tz6</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Multi-Scale and Detail-Enhanced Segment Anything Model for Salient Object Detection</title><source>arXiv.org</source><creator>Gao, Shixuan ; Zhang, Pingping ; Yan, Tianyu ; Lu, Huchuan</creator><creatorcontrib>Gao, Shixuan ; Zhang, Pingping ; Yan, Tianyu ; Lu, Huchuan</creatorcontrib><description>Salient Object Detection (SOD) aims to identify and segment the most prominent objects in images. Advanced SOD methods often utilize various Convolutional Neural Networks (CNN) or Transformers for deep feature extraction. However, these methods still deliver low performance and poor generalization in complex cases. Recently, Segment Anything Model (SAM) has been proposed as a visual fundamental model, which gives strong segmentation and generalization capabilities. Nonetheless, SAM requires accurate prompts of target objects, which are unavailable in SOD. Additionally, SAM lacks the utilization of multi-scale and multi-level information, as well as the incorporation of fine-grained details. To address these shortcomings, we propose a Multi-scale and Detail-enhanced SAM (MDSAM) for SOD. Specifically, we first introduce a Lightweight Multi-Scale Adapter (LMSA), which allows SAM to learn multi-scale information with very few trainable parameters. Then, we propose a Multi-Level Fusion Module (MLFM) to comprehensively utilize the multi-level information from the SAM's encoder. Finally, we propose a Detail Enhancement Module (DEM) to incorporate SAM with fine-grained details. Experimental results demonstrate the superior performance of our model on multiple SOD datasets and its strong generalization on other segmentation tasks. The source code is released at https://github.com/BellyBeauty/MDSAM.</description><identifier>DOI: 10.48550/arxiv.2408.04326</identifier><language>eng</language><subject>Computer Science - Computer Vision and Pattern Recognition ; Computer Science - Multimedia</subject><creationdate>2024-08</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2408.04326$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2408.04326$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Gao, Shixuan</creatorcontrib><creatorcontrib>Zhang, Pingping</creatorcontrib><creatorcontrib>Yan, Tianyu</creatorcontrib><creatorcontrib>Lu, Huchuan</creatorcontrib><title>Multi-Scale and Detail-Enhanced Segment Anything Model for Salient Object Detection</title><description>Salient Object Detection (SOD) aims to identify and segment the most prominent objects in images. Advanced SOD methods often utilize various Convolutional Neural Networks (CNN) or Transformers for deep feature extraction. However, these methods still deliver low performance and poor generalization in complex cases. Recently, Segment Anything Model (SAM) has been proposed as a visual fundamental model, which gives strong segmentation and generalization capabilities. Nonetheless, SAM requires accurate prompts of target objects, which are unavailable in SOD. Additionally, SAM lacks the utilization of multi-scale and multi-level information, as well as the incorporation of fine-grained details. To address these shortcomings, we propose a Multi-scale and Detail-enhanced SAM (MDSAM) for SOD. Specifically, we first introduce a Lightweight Multi-Scale Adapter (LMSA), which allows SAM to learn multi-scale information with very few trainable parameters. Then, we propose a Multi-Level Fusion Module (MLFM) to comprehensively utilize the multi-level information from the SAM's encoder. Finally, we propose a Detail Enhancement Module (DEM) to incorporate SAM with fine-grained details. Experimental results demonstrate the superior performance of our model on multiple SOD datasets and its strong generalization on other segmentation tasks. The source code is released at https://github.com/BellyBeauty/MDSAM.</description><subject>Computer Science - Computer Vision and Pattern Recognition</subject><subject>Computer Science - Multimedia</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjGw0DMwMTYy42QI9i3NKcnUDU5OzElVSMxLUXBJLUnMzNF1zctIzEtOTVEITk3PTc0rUXDMqyzJyMxLV_DNT0nNUUjLL1IITszJBEn5J2WlJpeAdAKpzPw8HgbWtMSc4lReKM3NIO_mGuLsoQu2Pr6gKDM3sagyHuSMeLAzjAmrAABh0Tz6</recordid><startdate>20240808</startdate><enddate>20240808</enddate><creator>Gao, Shixuan</creator><creator>Zhang, Pingping</creator><creator>Yan, Tianyu</creator><creator>Lu, Huchuan</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240808</creationdate><title>Multi-Scale and Detail-Enhanced Segment Anything Model for Salient Object Detection</title><author>Gao, Shixuan ; Zhang, Pingping ; Yan, Tianyu ; Lu, Huchuan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2408_043263</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Computer Vision and Pattern Recognition</topic><topic>Computer Science - Multimedia</topic><toplevel>online_resources</toplevel><creatorcontrib>Gao, Shixuan</creatorcontrib><creatorcontrib>Zhang, Pingping</creatorcontrib><creatorcontrib>Yan, Tianyu</creatorcontrib><creatorcontrib>Lu, Huchuan</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Gao, Shixuan</au><au>Zhang, Pingping</au><au>Yan, Tianyu</au><au>Lu, Huchuan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Multi-Scale and Detail-Enhanced Segment Anything Model for Salient Object Detection</atitle><date>2024-08-08</date><risdate>2024</risdate><abstract>Salient Object Detection (SOD) aims to identify and segment the most prominent objects in images. Advanced SOD methods often utilize various Convolutional Neural Networks (CNN) or Transformers for deep feature extraction. However, these methods still deliver low performance and poor generalization in complex cases. Recently, Segment Anything Model (SAM) has been proposed as a visual fundamental model, which gives strong segmentation and generalization capabilities. Nonetheless, SAM requires accurate prompts of target objects, which are unavailable in SOD. Additionally, SAM lacks the utilization of multi-scale and multi-level information, as well as the incorporation of fine-grained details. To address these shortcomings, we propose a Multi-scale and Detail-enhanced SAM (MDSAM) for SOD. Specifically, we first introduce a Lightweight Multi-Scale Adapter (LMSA), which allows SAM to learn multi-scale information with very few trainable parameters. Then, we propose a Multi-Level Fusion Module (MLFM) to comprehensively utilize the multi-level information from the SAM's encoder. Finally, we propose a Detail Enhancement Module (DEM) to incorporate SAM with fine-grained details. Experimental results demonstrate the superior performance of our model on multiple SOD datasets and its strong generalization on other segmentation tasks. The source code is released at https://github.com/BellyBeauty/MDSAM.</abstract><doi>10.48550/arxiv.2408.04326</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2408.04326
ispartof
issn
language eng
recordid cdi_arxiv_primary_2408_04326
source arXiv.org
subjects Computer Science - Computer Vision and Pattern Recognition
Computer Science - Multimedia
title Multi-Scale and Detail-Enhanced Segment Anything Model for Salient Object Detection
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-01T16%3A14%3A25IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Multi-Scale%20and%20Detail-Enhanced%20Segment%20Anything%20Model%20for%20Salient%20Object%20Detection&rft.au=Gao,%20Shixuan&rft.date=2024-08-08&rft_id=info:doi/10.48550/arxiv.2408.04326&rft_dat=%3Carxiv_GOX%3E2408_04326%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true