SnipeDet: Attention-guided pyramidal prediction kernels for generic object detection

•APPK module is proposed to tackle misalignments of different scales of objects.•IoU-adaptive loss function helps networks to deal with the hard negative samples.•SORR module is devised to improve the detection efficiency.•Interleaved subsampling method can enhance feature representations. Using sin...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Pattern recognition letters 2021-12, Vol.152, p.302-310
Hauptverfasser:	Chen, Suting, Cheng, Zehua, Zhang, Liangchen, Zheng, Yujie
Format:	Artikel
Sprache:	eng
Schlagworte:	Attention mechanism Detectors Feature enhancement Feature maps Hard negative mining Kernels Misalignment Modules Object detection Object recognition Prediction module Predictions Salience Sensors
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	310
container_issue
container_start_page	302
container_title	Pattern recognition letters
container_volume	152
creator	Chen, Suting Cheng, Zehua Zhang, Liangchen Zheng, Yujie
description	•APPK module is proposed to tackle misalignments of different scales of objects.•IoU-adaptive loss function helps networks to deal with the hard negative samples.•SORR module is devised to improve the detection efficiency.•Interleaved subsampling method can enhance feature representations. Using single-scale prediction kernels or Region of Interest (RoI) pooling in the prediction modules of modern object detectors is not very successful in matching different scales of objects. State-of-the-art detectors with the feature pyramid structure built on different resolutions of feature maps can help alleviate this problem. Although with this structure, single-scale prediction kernels or RoI pooling still struggles to detect small objects, and simultaneously, the former continues to encounter the misalignment problem on very large objects. In this paper, we propose the attention-guided pyramidal prediction kernels module with a customized IoU-adaptive loss function to deal with the misalignment problem between the prediction module and different scales of objects. To mitigate the effect of heavy detection head, we also introduce the salient object regions recognition module to identify these regions that have strong object cues. Additionally, interleaved subsampling, as the proposed feature enhancement approach, is used to generate highly discriminative feature representations. We refer to the detection framework constituted by these proposed methods as SnipeDet. Results show that SnipeDet achieves 41.1 AP at the speed of 15.4 FPS on the MS COCO test-dev set with 512 × 512 input images, which outperforms state-of-the-art one-stage detectors and has a better trade-off between speed and accuracy.
doi_str_mv	10.1016/j.patrec.2021.10.026
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2623463130</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0167865521003858</els_id><sourcerecordid>2623463130</sourcerecordid><originalsourceid>FETCH-LOGICAL-c334t-d30285bb199074eea9436cff5f7958a7c1f2553a350b7258a27adf35fba2af7b3</originalsourceid><addsrcrecordid>eNp9kE1LxDAQhoMouK7-Aw8Bz635aJrWg7Csn7DgwfUc0mSypO62Nc0K--9NXc-eBt55ZoZ5ELqmJKeElrdtPugYwOSMMJqinLDyBM1oJVkmeVGcolnCZFaVQpyji3FsCSElr6sZWr93foAHiHd4ESN00fddttl7CxYPh6B33uotHgJYb6Ye_oTQwXbErg94Ax0Eb3DftGAithDhF7pEZ05vR7j6q3P08fS4Xr5kq7fn1-VilRnOi5hZTlglmobWNZEFgK4LXhrnhJO1qLQ01DEhuOaCNJKlhEltHReu0Uw72fA5ujnuHUL_tYcxqrbfhy6dVKxkvCg55SRRxZEyoR_HAE4Nwe90OChK1ORPteroT03-pjT5S2P3x7H0LXx7CGo0HjqTTCQ0Ktv7_xf8APVDe8I</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2623463130</pqid></control><display><type>article</type><title>SnipeDet: Attention-guided pyramidal prediction kernels for generic object detection</title><source>Elsevier ScienceDirect Journals</source><creator>Chen, Suting ; Cheng, Zehua ; Zhang, Liangchen ; Zheng, Yujie</creator><creatorcontrib>Chen, Suting ; Cheng, Zehua ; Zhang, Liangchen ; Zheng, Yujie</creatorcontrib><description>•APPK module is proposed to tackle misalignments of different scales of objects.•IoU-adaptive loss function helps networks to deal with the hard negative samples.•SORR module is devised to improve the detection efficiency.•Interleaved subsampling method can enhance feature representations. Using single-scale prediction kernels or Region of Interest (RoI) pooling in the prediction modules of modern object detectors is not very successful in matching different scales of objects. State-of-the-art detectors with the feature pyramid structure built on different resolutions of feature maps can help alleviate this problem. Although with this structure, single-scale prediction kernels or RoI pooling still struggles to detect small objects, and simultaneously, the former continues to encounter the misalignment problem on very large objects. In this paper, we propose the attention-guided pyramidal prediction kernels module with a customized IoU-adaptive loss function to deal with the misalignment problem between the prediction module and different scales of objects. To mitigate the effect of heavy detection head, we also introduce the salient object regions recognition module to identify these regions that have strong object cues. Additionally, interleaved subsampling, as the proposed feature enhancement approach, is used to generate highly discriminative feature representations. We refer to the detection framework constituted by these proposed methods as SnipeDet. Results show that SnipeDet achieves 41.1 AP at the speed of 15.4 FPS on the MS COCO test-dev set with 512 × 512 input images, which outperforms state-of-the-art one-stage detectors and has a better trade-off between speed and accuracy.</description><identifier>ISSN: 0167-8655</identifier><identifier>EISSN: 1872-7344</identifier><identifier>DOI: 10.1016/j.patrec.2021.10.026</identifier><language>eng</language><publisher>Amsterdam: Elsevier B.V</publisher><subject>Attention mechanism ; Detectors ; Feature enhancement ; Feature maps ; Hard negative mining ; Kernels ; Misalignment ; Modules ; Object detection ; Object recognition ; Prediction module ; Predictions ; Salience ; Sensors</subject><ispartof>Pattern recognition letters, 2021-12, Vol.152, p.302-310</ispartof><rights>2021 Elsevier B.V.</rights><rights>Copyright Elsevier Science Ltd. Dec 2021</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c334t-d30285bb199074eea9436cff5f7958a7c1f2553a350b7258a27adf35fba2af7b3</citedby><cites>FETCH-LOGICAL-c334t-d30285bb199074eea9436cff5f7958a7c1f2553a350b7258a27adf35fba2af7b3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.sciencedirect.com/science/article/pii/S0167865521003858$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,776,780,3537,27901,27902,65306</link.rule.ids></links><search><creatorcontrib>Chen, Suting</creatorcontrib><creatorcontrib>Cheng, Zehua</creatorcontrib><creatorcontrib>Zhang, Liangchen</creatorcontrib><creatorcontrib>Zheng, Yujie</creatorcontrib><title>SnipeDet: Attention-guided pyramidal prediction kernels for generic object detection</title><title>Pattern recognition letters</title><description>•APPK module is proposed to tackle misalignments of different scales of objects.•IoU-adaptive loss function helps networks to deal with the hard negative samples.•SORR module is devised to improve the detection efficiency.•Interleaved subsampling method can enhance feature representations. Using single-scale prediction kernels or Region of Interest (RoI) pooling in the prediction modules of modern object detectors is not very successful in matching different scales of objects. State-of-the-art detectors with the feature pyramid structure built on different resolutions of feature maps can help alleviate this problem. Although with this structure, single-scale prediction kernels or RoI pooling still struggles to detect small objects, and simultaneously, the former continues to encounter the misalignment problem on very large objects. In this paper, we propose the attention-guided pyramidal prediction kernels module with a customized IoU-adaptive loss function to deal with the misalignment problem between the prediction module and different scales of objects. To mitigate the effect of heavy detection head, we also introduce the salient object regions recognition module to identify these regions that have strong object cues. Additionally, interleaved subsampling, as the proposed feature enhancement approach, is used to generate highly discriminative feature representations. We refer to the detection framework constituted by these proposed methods as SnipeDet. Results show that SnipeDet achieves 41.1 AP at the speed of 15.4 FPS on the MS COCO test-dev set with 512 × 512 input images, which outperforms state-of-the-art one-stage detectors and has a better trade-off between speed and accuracy.</description><subject>Attention mechanism</subject><subject>Detectors</subject><subject>Feature enhancement</subject><subject>Feature maps</subject><subject>Hard negative mining</subject><subject>Kernels</subject><subject>Misalignment</subject><subject>Modules</subject><subject>Object detection</subject><subject>Object recognition</subject><subject>Prediction module</subject><subject>Predictions</subject><subject>Salience</subject><subject>Sensors</subject><issn>0167-8655</issn><issn>1872-7344</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><recordid>eNp9kE1LxDAQhoMouK7-Aw8Bz635aJrWg7Csn7DgwfUc0mSypO62Nc0K--9NXc-eBt55ZoZ5ELqmJKeElrdtPugYwOSMMJqinLDyBM1oJVkmeVGcolnCZFaVQpyji3FsCSElr6sZWr93foAHiHd4ESN00fddttl7CxYPh6B33uotHgJYb6Ye_oTQwXbErg94Ax0Eb3DftGAithDhF7pEZ05vR7j6q3P08fS4Xr5kq7fn1-VilRnOi5hZTlglmobWNZEFgK4LXhrnhJO1qLQ01DEhuOaCNJKlhEltHReu0Uw72fA5ujnuHUL_tYcxqrbfhy6dVKxkvCg55SRRxZEyoR_HAE4Nwe90OChK1ORPteroT03-pjT5S2P3x7H0LXx7CGo0HjqTTCQ0Ktv7_xf8APVDe8I</recordid><startdate>202112</startdate><enddate>202112</enddate><creator>Chen, Suting</creator><creator>Cheng, Zehua</creator><creator>Zhang, Liangchen</creator><creator>Zheng, Yujie</creator><general>Elsevier B.V</general><general>Elsevier Science Ltd</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7TK</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>202112</creationdate><title>SnipeDet: Attention-guided pyramidal prediction kernels for generic object detection</title><author>Chen, Suting ; Cheng, Zehua ; Zhang, Liangchen ; Zheng, Yujie</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c334t-d30285bb199074eea9436cff5f7958a7c1f2553a350b7258a27adf35fba2af7b3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Attention mechanism</topic><topic>Detectors</topic><topic>Feature enhancement</topic><topic>Feature maps</topic><topic>Hard negative mining</topic><topic>Kernels</topic><topic>Misalignment</topic><topic>Modules</topic><topic>Object detection</topic><topic>Object recognition</topic><topic>Prediction module</topic><topic>Predictions</topic><topic>Salience</topic><topic>Sensors</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Chen, Suting</creatorcontrib><creatorcontrib>Cheng, Zehua</creatorcontrib><creatorcontrib>Zhang, Liangchen</creatorcontrib><creatorcontrib>Zheng, Yujie</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Neurosciences Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Pattern recognition letters</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Chen, Suting</au><au>Cheng, Zehua</au><au>Zhang, Liangchen</au><au>Zheng, Yujie</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>SnipeDet: Attention-guided pyramidal prediction kernels for generic object detection</atitle><jtitle>Pattern recognition letters</jtitle><date>2021-12</date><risdate>2021</risdate><volume>152</volume><spage>302</spage><epage>310</epage><pages>302-310</pages><issn>0167-8655</issn><eissn>1872-7344</eissn><abstract>•APPK module is proposed to tackle misalignments of different scales of objects.•IoU-adaptive loss function helps networks to deal with the hard negative samples.•SORR module is devised to improve the detection efficiency.•Interleaved subsampling method can enhance feature representations. Using single-scale prediction kernels or Region of Interest (RoI) pooling in the prediction modules of modern object detectors is not very successful in matching different scales of objects. State-of-the-art detectors with the feature pyramid structure built on different resolutions of feature maps can help alleviate this problem. Although with this structure, single-scale prediction kernels or RoI pooling still struggles to detect small objects, and simultaneously, the former continues to encounter the misalignment problem on very large objects. In this paper, we propose the attention-guided pyramidal prediction kernels module with a customized IoU-adaptive loss function to deal with the misalignment problem between the prediction module and different scales of objects. To mitigate the effect of heavy detection head, we also introduce the salient object regions recognition module to identify these regions that have strong object cues. Additionally, interleaved subsampling, as the proposed feature enhancement approach, is used to generate highly discriminative feature representations. We refer to the detection framework constituted by these proposed methods as SnipeDet. Results show that SnipeDet achieves 41.1 AP at the speed of 15.4 FPS on the MS COCO test-dev set with 512 × 512 input images, which outperforms state-of-the-art one-stage detectors and has a better trade-off between speed and accuracy.</abstract><cop>Amsterdam</cop><pub>Elsevier B.V</pub><doi>10.1016/j.patrec.2021.10.026</doi><tpages>9</tpages></addata></record>
fulltext	fulltext
identifier	ISSN: 0167-8655
ispartof	Pattern recognition letters, 2021-12, Vol.152, p.302-310
issn	0167-8655 1872-7344
language	eng
recordid	cdi_proquest_journals_2623463130
source	Elsevier ScienceDirect Journals
subjects	Attention mechanism Detectors Feature enhancement Feature maps Hard negative mining Kernels Misalignment Modules Object detection Object recognition Prediction module Predictions Salience Sensors
title	SnipeDet: Attention-guided pyramidal prediction kernels for generic object detection
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-04T20%3A27%3A00IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=SnipeDet:%20Attention-guided%20pyramidal%20prediction%20kernels%20for%20generic%20object%20detection&rft.jtitle=Pattern%20recognition%20letters&rft.au=Chen,%20Suting&rft.date=2021-12&rft.volume=152&rft.spage=302&rft.epage=310&rft.pages=302-310&rft.issn=0167-8655&rft.eissn=1872-7344&rft_id=info:doi/10.1016/j.patrec.2021.10.026&rft_dat=%3Cproquest_cross%3E2623463130%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2623463130&rft_id=info:pmid/&rft_els_id=S0167865521003858&rfr_iscdi=true