An approach to improve SSD through mask prediction of multi-scale feature maps

We propose a novel single shot object detection network with a mask prediction branch. Our motivation is to enhance object detection features with semantic information extracted from deeper layers. The proposed mask prediction branch enriches important features in shallower layers with pixel-wise pr...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Pattern analysis and applications : PAA 2021-08, Vol.24 (3), p.1357-1366
Hauptverfasser:	Sun, Peng, Zhao, Yaqin, Zhu, Songhao
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer networks Computer Science Feature extraction Feature maps Object recognition Pattern Recognition Performance enhancement Semantics Short Paper
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	1366
container_issue	3
container_start_page	1357
container_title	Pattern analysis and applications : PAA
container_volume	24
creator	Sun, Peng Zhao, Yaqin Zhu, Songhao
description	We propose a novel single shot object detection network with a mask prediction branch. Our motivation is to enhance object detection features with semantic information extracted from deeper layers. The proposed mask prediction branch enriches important features in shallower layers with pixel-wise probability distribution of semantic information. Meanwhile, an improved receptive field block is adopted to increase the scale of receptive field of backbone network without too much extra computing burden. Our network improves the performance significantly over SSD and FSSD (Feature Fusion Single Shot Multi-box Detector) with just a little speed drop. In addition, we discuss the relationship between effective receptive fields and theoretical receptive fields on VGG16 backbone network. Comprehensive experimental results on PASCAL VOC 2007 demonstrate the effectiveness of the proposed method. We achieve a mAP of 79.8 with 300 × 300 input images (81.2 mAP by 512 × 512 inputs) at the speed of 58.4 FPS on a single Nvidia 1080Ti GPU. Experimental results demonstrate that the proposed network achieves a comparable performance with the state-of-the-arts.
doi_str_mv	10.1007/s10044-021-00993-x
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2557061121</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2557061121</sourcerecordid><originalsourceid>FETCH-LOGICAL-c319t-22b5f9819e2cbc44b614a84ca94a6695465ac762c7d7abeba3d799b6948c37de3</originalsourceid><addsrcrecordid>eNp9kM1OwzAQhC0EEqXwApwscTb4L3F8rAoUpAoOBYmb5ThOm9LEwXZQeXsMQXDjsjuHb2ZXA8A5wZcEY3EV0uQcYUoQxlIytD8AE8IZQyLLXg5_NSfH4CSELcaMMVpMwMOsg7rvvdNmA6ODTZv0u4Wr1TWMG--G9Qa2OrzC3tuqMbFxHXQ1bIddbFAwemdhbXUcvE1YH07BUa13wZ797Cl4vr15mt-h5ePifj5bIsOIjIjSMqtlQaSlpjSclznhuuBGS67zXGY8z7QROTWiErq0pWaVkLLMJS8ME5VlU3Ax5qZv3wYbotq6wXfppKJZJnBOCCWJoiNlvAvB21r1vmm1_1AEq6_e1NibSr2p797UPpnYaAoJ7tbW_0X_4_oEOT1w2w</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2557061121</pqid></control><display><type>article</type><title>An approach to improve SSD through mask prediction of multi-scale feature maps</title><source>SpringerLink Journals - AutoHoldings</source><creator>Sun, Peng ; Zhao, Yaqin ; Zhu, Songhao</creator><creatorcontrib>Sun, Peng ; Zhao, Yaqin ; Zhu, Songhao</creatorcontrib><description>We propose a novel single shot object detection network with a mask prediction branch. Our motivation is to enhance object detection features with semantic information extracted from deeper layers. The proposed mask prediction branch enriches important features in shallower layers with pixel-wise probability distribution of semantic information. Meanwhile, an improved receptive field block is adopted to increase the scale of receptive field of backbone network without too much extra computing burden. Our network improves the performance significantly over SSD and FSSD (Feature Fusion Single Shot Multi-box Detector) with just a little speed drop. In addition, we discuss the relationship between effective receptive fields and theoretical receptive fields on VGG16 backbone network. Comprehensive experimental results on PASCAL VOC 2007 demonstrate the effectiveness of the proposed method. We achieve a mAP of 79.8 with 300 × 300 input images (81.2 mAP by 512 × 512 inputs) at the speed of 58.4 FPS on a single Nvidia 1080Ti GPU. Experimental results demonstrate that the proposed network achieves a comparable performance with the state-of-the-arts.</description><identifier>ISSN: 1433-7541</identifier><identifier>EISSN: 1433-755X</identifier><identifier>DOI: 10.1007/s10044-021-00993-x</identifier><language>eng</language><publisher>London: Springer London</publisher><subject>Computer networks ; Computer Science ; Feature extraction ; Feature maps ; Object recognition ; Pattern Recognition ; Performance enhancement ; Semantics ; Short Paper</subject><ispartof>Pattern analysis and applications : PAA, 2021-08, Vol.24 (3), p.1357-1366</ispartof><rights>The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2021</rights><rights>The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2021.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c319t-22b5f9819e2cbc44b614a84ca94a6695465ac762c7d7abeba3d799b6948c37de3</citedby><cites>FETCH-LOGICAL-c319t-22b5f9819e2cbc44b614a84ca94a6695465ac762c7d7abeba3d799b6948c37de3</cites><orcidid>0000-0002-9891-5692</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s10044-021-00993-x$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s10044-021-00993-x$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,780,784,27923,27924,41487,42556,51318</link.rule.ids></links><search><creatorcontrib>Sun, Peng</creatorcontrib><creatorcontrib>Zhao, Yaqin</creatorcontrib><creatorcontrib>Zhu, Songhao</creatorcontrib><title>An approach to improve SSD through mask prediction of multi-scale feature maps</title><title>Pattern analysis and applications : PAA</title><addtitle>Pattern Anal Applic</addtitle><description>We propose a novel single shot object detection network with a mask prediction branch. Our motivation is to enhance object detection features with semantic information extracted from deeper layers. The proposed mask prediction branch enriches important features in shallower layers with pixel-wise probability distribution of semantic information. Meanwhile, an improved receptive field block is adopted to increase the scale of receptive field of backbone network without too much extra computing burden. Our network improves the performance significantly over SSD and FSSD (Feature Fusion Single Shot Multi-box Detector) with just a little speed drop. In addition, we discuss the relationship between effective receptive fields and theoretical receptive fields on VGG16 backbone network. Comprehensive experimental results on PASCAL VOC 2007 demonstrate the effectiveness of the proposed method. We achieve a mAP of 79.8 with 300 × 300 input images (81.2 mAP by 512 × 512 inputs) at the speed of 58.4 FPS on a single Nvidia 1080Ti GPU. Experimental results demonstrate that the proposed network achieves a comparable performance with the state-of-the-arts.</description><subject>Computer networks</subject><subject>Computer Science</subject><subject>Feature extraction</subject><subject>Feature maps</subject><subject>Object recognition</subject><subject>Pattern Recognition</subject><subject>Performance enhancement</subject><subject>Semantics</subject><subject>Short Paper</subject><issn>1433-7541</issn><issn>1433-755X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><recordid>eNp9kM1OwzAQhC0EEqXwApwscTb4L3F8rAoUpAoOBYmb5ThOm9LEwXZQeXsMQXDjsjuHb2ZXA8A5wZcEY3EV0uQcYUoQxlIytD8AE8IZQyLLXg5_NSfH4CSELcaMMVpMwMOsg7rvvdNmA6ODTZv0u4Wr1TWMG--G9Qa2OrzC3tuqMbFxHXQ1bIddbFAwemdhbXUcvE1YH07BUa13wZ797Cl4vr15mt-h5ePifj5bIsOIjIjSMqtlQaSlpjSclznhuuBGS67zXGY8z7QROTWiErq0pWaVkLLMJS8ME5VlU3Ax5qZv3wYbotq6wXfppKJZJnBOCCWJoiNlvAvB21r1vmm1_1AEq6_e1NibSr2p797UPpnYaAoJ7tbW_0X_4_oEOT1w2w</recordid><startdate>20210801</startdate><enddate>20210801</enddate><creator>Sun, Peng</creator><creator>Zhao, Yaqin</creator><creator>Zhu, Songhao</creator><general>Springer London</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0002-9891-5692</orcidid></search><sort><creationdate>20210801</creationdate><title>An approach to improve SSD through mask prediction of multi-scale feature maps</title><author>Sun, Peng ; Zhao, Yaqin ; Zhu, Songhao</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c319t-22b5f9819e2cbc44b614a84ca94a6695465ac762c7d7abeba3d799b6948c37de3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Computer networks</topic><topic>Computer Science</topic><topic>Feature extraction</topic><topic>Feature maps</topic><topic>Object recognition</topic><topic>Pattern Recognition</topic><topic>Performance enhancement</topic><topic>Semantics</topic><topic>Short Paper</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Sun, Peng</creatorcontrib><creatorcontrib>Zhao, Yaqin</creatorcontrib><creatorcontrib>Zhu, Songhao</creatorcontrib><collection>CrossRef</collection><jtitle>Pattern analysis and applications : PAA</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Sun, Peng</au><au>Zhao, Yaqin</au><au>Zhu, Songhao</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>An approach to improve SSD through mask prediction of multi-scale feature maps</atitle><jtitle>Pattern analysis and applications : PAA</jtitle><stitle>Pattern Anal Applic</stitle><date>2021-08-01</date><risdate>2021</risdate><volume>24</volume><issue>3</issue><spage>1357</spage><epage>1366</epage><pages>1357-1366</pages><issn>1433-7541</issn><eissn>1433-755X</eissn><abstract>We propose a novel single shot object detection network with a mask prediction branch. Our motivation is to enhance object detection features with semantic information extracted from deeper layers. The proposed mask prediction branch enriches important features in shallower layers with pixel-wise probability distribution of semantic information. Meanwhile, an improved receptive field block is adopted to increase the scale of receptive field of backbone network without too much extra computing burden. Our network improves the performance significantly over SSD and FSSD (Feature Fusion Single Shot Multi-box Detector) with just a little speed drop. In addition, we discuss the relationship between effective receptive fields and theoretical receptive fields on VGG16 backbone network. Comprehensive experimental results on PASCAL VOC 2007 demonstrate the effectiveness of the proposed method. We achieve a mAP of 79.8 with 300 × 300 input images (81.2 mAP by 512 × 512 inputs) at the speed of 58.4 FPS on a single Nvidia 1080Ti GPU. Experimental results demonstrate that the proposed network achieves a comparable performance with the state-of-the-arts.</abstract><cop>London</cop><pub>Springer London</pub><doi>10.1007/s10044-021-00993-x</doi><tpages>10</tpages><orcidid>https://orcid.org/0000-0002-9891-5692</orcidid></addata></record>
fulltext	fulltext
identifier	ISSN: 1433-7541
ispartof	Pattern analysis and applications : PAA, 2021-08, Vol.24 (3), p.1357-1366
issn	1433-7541 1433-755X
language	eng
recordid	cdi_proquest_journals_2557061121
source	SpringerLink Journals - AutoHoldings
subjects	Computer networks Computer Science Feature extraction Feature maps Object recognition Pattern Recognition Performance enhancement Semantics Short Paper
title	An approach to improve SSD through mask prediction of multi-scale feature maps
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-11T19%3A50%3A46IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=An%20approach%20to%20improve%20SSD%20through%20mask%20prediction%20of%20multi-scale%20feature%20maps&rft.jtitle=Pattern%20analysis%20and%20applications%20:%20PAA&rft.au=Sun,%20Peng&rft.date=2021-08-01&rft.volume=24&rft.issue=3&rft.spage=1357&rft.epage=1366&rft.pages=1357-1366&rft.issn=1433-7541&rft.eissn=1433-755X&rft_id=info:doi/10.1007/s10044-021-00993-x&rft_dat=%3Cproquest_cross%3E2557061121%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2557061121&rft_id=info:pmid/&rfr_iscdi=true