Multi-Scale and Detail-Enhanced Segment Anything Model for Salient Object Detection

Salient Object Detection (SOD) aims to identify and segment the most prominent objects in images. Advanced SOD methods often utilize various Convolutional Neural Networks (CNN) or Transformers for deep feature extraction. However, these methods still deliver low performance and poor generalization i...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Gao, Shixuan, Zhang, Pingping, Yan, Tianyu, Lu, Huchuan
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Computer Vision and Pattern Recognition Computer Science - Multimedia
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Gao, Shixuan Zhang, Pingping Yan, Tianyu Lu, Huchuan
description	Salient Object Detection (SOD) aims to identify and segment the most prominent objects in images. Advanced SOD methods often utilize various Convolutional Neural Networks (CNN) or Transformers for deep feature extraction. However, these methods still deliver low performance and poor generalization in complex cases. Recently, Segment Anything Model (SAM) has been proposed as a visual fundamental model, which gives strong segmentation and generalization capabilities. Nonetheless, SAM requires accurate prompts of target objects, which are unavailable in SOD. Additionally, SAM lacks the utilization of multi-scale and multi-level information, as well as the incorporation of fine-grained details. To address these shortcomings, we propose a Multi-scale and Detail-enhanced SAM (MDSAM) for SOD. Specifically, we first introduce a Lightweight Multi-Scale Adapter (LMSA), which allows SAM to learn multi-scale information with very few trainable parameters. Then, we propose a Multi-Level Fusion Module (MLFM) to comprehensively utilize the multi-level information from the SAM's encoder. Finally, we propose a Detail Enhancement Module (DEM) to incorporate SAM with fine-grained details. Experimental results demonstrate the superior performance of our model on multiple SOD datasets and its strong generalization on other segmentation tasks. The source code is released at https://github.com/BellyBeauty/MDSAM.
doi_str_mv	10.48550/arxiv.2408.04326
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2408_04326</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2408_04326</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2408_043263</originalsourceid><addsrcrecordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjGw0DMwMTYy42QI9i3NKcnUDU5OzElVSMxLUXBJLUnMzNF1zctIzEtOTVEITk3PTc0rUXDMqyzJyMxLV_DNT0nNUUjLL1IITszJBEn5J2WlJpeAdAKpzPw8HgbWtMSc4lReKM3NIO_mGuLsoQu2Pr6gKDM3sagyHuSMeLAzjAmrAABh0Tz6</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Multi-Scale and Detail-Enhanced Segment Anything Model for Salient Object Detection</title><source>arXiv.org</source><creator>Gao, Shixuan ; Zhang, Pingping ; Yan, Tianyu ; Lu, Huchuan</creator><creatorcontrib>Gao, Shixuan ; Zhang, Pingping ; Yan, Tianyu ; Lu, Huchuan</creatorcontrib><description>Salient Object Detection (SOD) aims to identify and segment the most prominent objects in images. Advanced SOD methods often utilize various Convolutional Neural Networks (CNN) or Transformers for deep feature extraction. However, these methods still deliver low performance and poor generalization in complex cases. Recently, Segment Anything Model (SAM) has been proposed as a visual fundamental model, which gives strong segmentation and generalization capabilities. Nonetheless, SAM requires accurate prompts of target objects, which are unavailable in SOD. Additionally, SAM lacks the utilization of multi-scale and multi-level information, as well as the incorporation of fine-grained details. To address these shortcomings, we propose a Multi-scale and Detail-enhanced SAM (MDSAM) for SOD. Specifically, we first introduce a Lightweight Multi-Scale Adapter (LMSA), which allows SAM to learn multi-scale information with very few trainable parameters. Then, we propose a Multi-Level Fusion Module (MLFM) to comprehensively utilize the multi-level information from the SAM's encoder. Finally, we propose a Detail Enhancement Module (DEM) to incorporate SAM with fine-grained details. Experimental results demonstrate the superior performance of our model on multiple SOD datasets and its strong generalization on other segmentation tasks. The source code is released at https://github.com/BellyBeauty/MDSAM.</description><identifier>DOI: 10.48550/arxiv.2408.04326</identifier><language>eng</language><subject>Computer Science - Computer Vision and Pattern Recognition ; Computer Science - Multimedia</subject><creationdate>2024-08</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2408.04326$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2408.04326$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Gao, Shixuan</creatorcontrib><creatorcontrib>Zhang, Pingping</creatorcontrib><creatorcontrib>Yan, Tianyu</creatorcontrib><creatorcontrib>Lu, Huchuan</creatorcontrib><title>Multi-Scale and Detail-Enhanced Segment Anything Model for Salient Object Detection</title><description>Salient Object Detection (SOD) aims to identify and segment the most prominent objects in images. Advanced SOD methods often utilize various Convolutional Neural Networks (CNN) or Transformers for deep feature extraction. However, these methods still deliver low performance and poor generalization in complex cases. Recently, Segment Anything Model (SAM) has been proposed as a visual fundamental model, which gives strong segmentation and generalization capabilities. Nonetheless, SAM requires accurate prompts of target objects, which are unavailable in SOD. Additionally, SAM lacks the utilization of multi-scale and multi-level information, as well as the incorporation of fine-grained details. To address these shortcomings, we propose a Multi-scale and Detail-enhanced SAM (MDSAM) for SOD. Specifically, we first introduce a Lightweight Multi-Scale Adapter (LMSA), which allows SAM to learn multi-scale information with very few trainable parameters. Then, we propose a Multi-Level Fusion Module (MLFM) to comprehensively utilize the multi-level information from the SAM's encoder. Finally, we propose a Detail Enhancement Module (DEM) to incorporate SAM with fine-grained details. Experimental results demonstrate the superior performance of our model on multiple SOD datasets and its strong generalization on other segmentation tasks. The source code is released at https://github.com/BellyBeauty/MDSAM.</description><subject>Computer Science - Computer Vision and Pattern Recognition</subject><subject>Computer Science - Multimedia</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjGw0DMwMTYy42QI9i3NKcnUDU5OzElVSMxLUXBJLUnMzNF1zctIzEtOTVEITk3PTc0rUXDMqyzJyMxLV_DNT0nNUUjLL1IITszJBEn5J2WlJpeAdAKpzPw8HgbWtMSc4lReKM3NIO_mGuLsoQu2Pr6gKDM3sagyHuSMeLAzjAmrAABh0Tz6</recordid><startdate>20240808</startdate><enddate>20240808</enddate><creator>Gao, Shixuan</creator><creator>Zhang, Pingping</creator><creator>Yan, Tianyu</creator><creator>Lu, Huchuan</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240808</creationdate><title>Multi-Scale and Detail-Enhanced Segment Anything Model for Salient Object Detection</title><author>Gao, Shixuan ; Zhang, Pingping ; Yan, Tianyu ; Lu, Huchuan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2408_043263</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Computer Vision and Pattern Recognition</topic><topic>Computer Science - Multimedia</topic><toplevel>online_resources</toplevel><creatorcontrib>Gao, Shixuan</creatorcontrib><creatorcontrib>Zhang, Pingping</creatorcontrib><creatorcontrib>Yan, Tianyu</creatorcontrib><creatorcontrib>Lu, Huchuan</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Gao, Shixuan</au><au>Zhang, Pingping</au><au>Yan, Tianyu</au><au>Lu, Huchuan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Multi-Scale and Detail-Enhanced Segment Anything Model for Salient Object Detection</atitle><date>2024-08-08</date><risdate>2024</risdate><abstract>Salient Object Detection (SOD) aims to identify and segment the most prominent objects in images. Advanced SOD methods often utilize various Convolutional Neural Networks (CNN) or Transformers for deep feature extraction. However, these methods still deliver low performance and poor generalization in complex cases. Recently, Segment Anything Model (SAM) has been proposed as a visual fundamental model, which gives strong segmentation and generalization capabilities. Nonetheless, SAM requires accurate prompts of target objects, which are unavailable in SOD. Additionally, SAM lacks the utilization of multi-scale and multi-level information, as well as the incorporation of fine-grained details. To address these shortcomings, we propose a Multi-scale and Detail-enhanced SAM (MDSAM) for SOD. Specifically, we first introduce a Lightweight Multi-Scale Adapter (LMSA), which allows SAM to learn multi-scale information with very few trainable parameters. Then, we propose a Multi-Level Fusion Module (MLFM) to comprehensively utilize the multi-level information from the SAM's encoder. Finally, we propose a Detail Enhancement Module (DEM) to incorporate SAM with fine-grained details. Experimental results demonstrate the superior performance of our model on multiple SOD datasets and its strong generalization on other segmentation tasks. The source code is released at https://github.com/BellyBeauty/MDSAM.</abstract><doi>10.48550/arxiv.2408.04326</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2408.04326
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2408_04326
source	arXiv.org
subjects	Computer Science - Computer Vision and Pattern Recognition Computer Science - Multimedia
title	Multi-Scale and Detail-Enhanced Segment Anything Model for Salient Object Detection
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-01T16%3A14%3A25IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Multi-Scale%20and%20Detail-Enhanced%20Segment%20Anything%20Model%20for%20Salient%20Object%20Detection&rft.au=Gao,%20Shixuan&rft.date=2024-08-08&rft_id=info:doi/10.48550/arxiv.2408.04326&rft_dat=%3Carxiv_GOX%3E2408_04326%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true