SnipeDet: Attention-guided pyramidal prediction kernels for generic object detection

•APPK module is proposed to tackle misalignments of different scales of objects.•IoU-adaptive loss function helps networks to deal with the hard negative samples.•SORR module is devised to improve the detection efficiency.•Interleaved subsampling method can enhance feature representations. Using sin...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Pattern recognition letters 2021-12, Vol.152, p.302-310
Hauptverfasser: Chen, Suting, Cheng, Zehua, Zhang, Liangchen, Zheng, Yujie
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 310
container_issue
container_start_page 302
container_title Pattern recognition letters
container_volume 152
creator Chen, Suting
Cheng, Zehua
Zhang, Liangchen
Zheng, Yujie
description •APPK module is proposed to tackle misalignments of different scales of objects.•IoU-adaptive loss function helps networks to deal with the hard negative samples.•SORR module is devised to improve the detection efficiency.•Interleaved subsampling method can enhance feature representations. Using single-scale prediction kernels or Region of Interest (RoI) pooling in the prediction modules of modern object detectors is not very successful in matching different scales of objects. State-of-the-art detectors with the feature pyramid structure built on different resolutions of feature maps can help alleviate this problem. Although with this structure, single-scale prediction kernels or RoI pooling still struggles to detect small objects, and simultaneously, the former continues to encounter the misalignment problem on very large objects. In this paper, we propose the attention-guided pyramidal prediction kernels module with a customized IoU-adaptive loss function to deal with the misalignment problem between the prediction module and different scales of objects. To mitigate the effect of heavy detection head, we also introduce the salient object regions recognition module to identify these regions that have strong object cues. Additionally, interleaved subsampling, as the proposed feature enhancement approach, is used to generate highly discriminative feature representations.  We refer to the detection framework constituted by these proposed methods as SnipeDet. Results show that SnipeDet achieves 41.1 AP at the speed of 15.4 FPS on the MS COCO test-dev set with 512 × 512 input images, which outperforms state-of-the-art one-stage detectors and has a better trade-off between speed and accuracy.
doi_str_mv 10.1016/j.patrec.2021.10.026
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2623463130</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0167865521003858</els_id><sourcerecordid>2623463130</sourcerecordid><originalsourceid>FETCH-LOGICAL-c334t-d30285bb199074eea9436cff5f7958a7c1f2553a350b7258a27adf35fba2af7b3</originalsourceid><addsrcrecordid>eNp9kE1LxDAQhoMouK7-Aw8Bz635aJrWg7Csn7DgwfUc0mSypO62Nc0K--9NXc-eBt55ZoZ5ELqmJKeElrdtPugYwOSMMJqinLDyBM1oJVkmeVGcolnCZFaVQpyji3FsCSElr6sZWr93foAHiHd4ESN00fddttl7CxYPh6B33uotHgJYb6Ye_oTQwXbErg94Ax0Eb3DftGAithDhF7pEZ05vR7j6q3P08fS4Xr5kq7fn1-VilRnOi5hZTlglmobWNZEFgK4LXhrnhJO1qLQ01DEhuOaCNJKlhEltHReu0Uw72fA5ujnuHUL_tYcxqrbfhy6dVKxkvCg55SRRxZEyoR_HAE4Nwe90OChK1ORPteroT03-pjT5S2P3x7H0LXx7CGo0HjqTTCQ0Ktv7_xf8APVDe8I</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2623463130</pqid></control><display><type>article</type><title>SnipeDet: Attention-guided pyramidal prediction kernels for generic object detection</title><source>Elsevier ScienceDirect Journals</source><creator>Chen, Suting ; Cheng, Zehua ; Zhang, Liangchen ; Zheng, Yujie</creator><creatorcontrib>Chen, Suting ; Cheng, Zehua ; Zhang, Liangchen ; Zheng, Yujie</creatorcontrib><description>•APPK module is proposed to tackle misalignments of different scales of objects.•IoU-adaptive loss function helps networks to deal with the hard negative samples.•SORR module is devised to improve the detection efficiency.•Interleaved subsampling method can enhance feature representations. Using single-scale prediction kernels or Region of Interest (RoI) pooling in the prediction modules of modern object detectors is not very successful in matching different scales of objects. State-of-the-art detectors with the feature pyramid structure built on different resolutions of feature maps can help alleviate this problem. Although with this structure, single-scale prediction kernels or RoI pooling still struggles to detect small objects, and simultaneously, the former continues to encounter the misalignment problem on very large objects. In this paper, we propose the attention-guided pyramidal prediction kernels module with a customized IoU-adaptive loss function to deal with the misalignment problem between the prediction module and different scales of objects. To mitigate the effect of heavy detection head, we also introduce the salient object regions recognition module to identify these regions that have strong object cues. Additionally, interleaved subsampling, as the proposed feature enhancement approach, is used to generate highly discriminative feature representations.  We refer to the detection framework constituted by these proposed methods as SnipeDet. Results show that SnipeDet achieves 41.1 AP at the speed of 15.4 FPS on the MS COCO test-dev set with 512 × 512 input images, which outperforms state-of-the-art one-stage detectors and has a better trade-off between speed and accuracy.</description><identifier>ISSN: 0167-8655</identifier><identifier>EISSN: 1872-7344</identifier><identifier>DOI: 10.1016/j.patrec.2021.10.026</identifier><language>eng</language><publisher>Amsterdam: Elsevier B.V</publisher><subject>Attention mechanism ; Detectors ; Feature enhancement ; Feature maps ; Hard negative mining ; Kernels ; Misalignment ; Modules ; Object detection ; Object recognition ; Prediction module ; Predictions ; Salience ; Sensors</subject><ispartof>Pattern recognition letters, 2021-12, Vol.152, p.302-310</ispartof><rights>2021 Elsevier B.V.</rights><rights>Copyright Elsevier Science Ltd. Dec 2021</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c334t-d30285bb199074eea9436cff5f7958a7c1f2553a350b7258a27adf35fba2af7b3</citedby><cites>FETCH-LOGICAL-c334t-d30285bb199074eea9436cff5f7958a7c1f2553a350b7258a27adf35fba2af7b3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.sciencedirect.com/science/article/pii/S0167865521003858$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,776,780,3537,27901,27902,65306</link.rule.ids></links><search><creatorcontrib>Chen, Suting</creatorcontrib><creatorcontrib>Cheng, Zehua</creatorcontrib><creatorcontrib>Zhang, Liangchen</creatorcontrib><creatorcontrib>Zheng, Yujie</creatorcontrib><title>SnipeDet: Attention-guided pyramidal prediction kernels for generic object detection</title><title>Pattern recognition letters</title><description>•APPK module is proposed to tackle misalignments of different scales of objects.•IoU-adaptive loss function helps networks to deal with the hard negative samples.•SORR module is devised to improve the detection efficiency.•Interleaved subsampling method can enhance feature representations. Using single-scale prediction kernels or Region of Interest (RoI) pooling in the prediction modules of modern object detectors is not very successful in matching different scales of objects. State-of-the-art detectors with the feature pyramid structure built on different resolutions of feature maps can help alleviate this problem. Although with this structure, single-scale prediction kernels or RoI pooling still struggles to detect small objects, and simultaneously, the former continues to encounter the misalignment problem on very large objects. In this paper, we propose the attention-guided pyramidal prediction kernels module with a customized IoU-adaptive loss function to deal with the misalignment problem between the prediction module and different scales of objects. To mitigate the effect of heavy detection head, we also introduce the salient object regions recognition module to identify these regions that have strong object cues. Additionally, interleaved subsampling, as the proposed feature enhancement approach, is used to generate highly discriminative feature representations.  We refer to the detection framework constituted by these proposed methods as SnipeDet. Results show that SnipeDet achieves 41.1 AP at the speed of 15.4 FPS on the MS COCO test-dev set with 512 × 512 input images, which outperforms state-of-the-art one-stage detectors and has a better trade-off between speed and accuracy.</description><subject>Attention mechanism</subject><subject>Detectors</subject><subject>Feature enhancement</subject><subject>Feature maps</subject><subject>Hard negative mining</subject><subject>Kernels</subject><subject>Misalignment</subject><subject>Modules</subject><subject>Object detection</subject><subject>Object recognition</subject><subject>Prediction module</subject><subject>Predictions</subject><subject>Salience</subject><subject>Sensors</subject><issn>0167-8655</issn><issn>1872-7344</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><recordid>eNp9kE1LxDAQhoMouK7-Aw8Bz635aJrWg7Csn7DgwfUc0mSypO62Nc0K--9NXc-eBt55ZoZ5ELqmJKeElrdtPugYwOSMMJqinLDyBM1oJVkmeVGcolnCZFaVQpyji3FsCSElr6sZWr93foAHiHd4ESN00fddttl7CxYPh6B33uotHgJYb6Ye_oTQwXbErg94Ax0Eb3DftGAithDhF7pEZ05vR7j6q3P08fS4Xr5kq7fn1-VilRnOi5hZTlglmobWNZEFgK4LXhrnhJO1qLQ01DEhuOaCNJKlhEltHReu0Uw72fA5ujnuHUL_tYcxqrbfhy6dVKxkvCg55SRRxZEyoR_HAE4Nwe90OChK1ORPteroT03-pjT5S2P3x7H0LXx7CGo0HjqTTCQ0Ktv7_xf8APVDe8I</recordid><startdate>202112</startdate><enddate>202112</enddate><creator>Chen, Suting</creator><creator>Cheng, Zehua</creator><creator>Zhang, Liangchen</creator><creator>Zheng, Yujie</creator><general>Elsevier B.V</general><general>Elsevier Science Ltd</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7TK</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>202112</creationdate><title>SnipeDet: Attention-guided pyramidal prediction kernels for generic object detection</title><author>Chen, Suting ; Cheng, Zehua ; Zhang, Liangchen ; Zheng, Yujie</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c334t-d30285bb199074eea9436cff5f7958a7c1f2553a350b7258a27adf35fba2af7b3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Attention mechanism</topic><topic>Detectors</topic><topic>Feature enhancement</topic><topic>Feature maps</topic><topic>Hard negative mining</topic><topic>Kernels</topic><topic>Misalignment</topic><topic>Modules</topic><topic>Object detection</topic><topic>Object recognition</topic><topic>Prediction module</topic><topic>Predictions</topic><topic>Salience</topic><topic>Sensors</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Chen, Suting</creatorcontrib><creatorcontrib>Cheng, Zehua</creatorcontrib><creatorcontrib>Zhang, Liangchen</creatorcontrib><creatorcontrib>Zheng, Yujie</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Neurosciences Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Pattern recognition letters</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Chen, Suting</au><au>Cheng, Zehua</au><au>Zhang, Liangchen</au><au>Zheng, Yujie</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>SnipeDet: Attention-guided pyramidal prediction kernels for generic object detection</atitle><jtitle>Pattern recognition letters</jtitle><date>2021-12</date><risdate>2021</risdate><volume>152</volume><spage>302</spage><epage>310</epage><pages>302-310</pages><issn>0167-8655</issn><eissn>1872-7344</eissn><abstract>•APPK module is proposed to tackle misalignments of different scales of objects.•IoU-adaptive loss function helps networks to deal with the hard negative samples.•SORR module is devised to improve the detection efficiency.•Interleaved subsampling method can enhance feature representations. Using single-scale prediction kernels or Region of Interest (RoI) pooling in the prediction modules of modern object detectors is not very successful in matching different scales of objects. State-of-the-art detectors with the feature pyramid structure built on different resolutions of feature maps can help alleviate this problem. Although with this structure, single-scale prediction kernels or RoI pooling still struggles to detect small objects, and simultaneously, the former continues to encounter the misalignment problem on very large objects. In this paper, we propose the attention-guided pyramidal prediction kernels module with a customized IoU-adaptive loss function to deal with the misalignment problem between the prediction module and different scales of objects. To mitigate the effect of heavy detection head, we also introduce the salient object regions recognition module to identify these regions that have strong object cues. Additionally, interleaved subsampling, as the proposed feature enhancement approach, is used to generate highly discriminative feature representations.  We refer to the detection framework constituted by these proposed methods as SnipeDet. Results show that SnipeDet achieves 41.1 AP at the speed of 15.4 FPS on the MS COCO test-dev set with 512 × 512 input images, which outperforms state-of-the-art one-stage detectors and has a better trade-off between speed and accuracy.</abstract><cop>Amsterdam</cop><pub>Elsevier B.V</pub><doi>10.1016/j.patrec.2021.10.026</doi><tpages>9</tpages></addata></record>
fulltext fulltext
identifier ISSN: 0167-8655
ispartof Pattern recognition letters, 2021-12, Vol.152, p.302-310
issn 0167-8655
1872-7344
language eng
recordid cdi_proquest_journals_2623463130
source Elsevier ScienceDirect Journals
subjects Attention mechanism
Detectors
Feature enhancement
Feature maps
Hard negative mining
Kernels
Misalignment
Modules
Object detection
Object recognition
Prediction module
Predictions
Salience
Sensors
title SnipeDet: Attention-guided pyramidal prediction kernels for generic object detection
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-04T20%3A27%3A00IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=SnipeDet:%20Attention-guided%20pyramidal%20prediction%20kernels%20for%20generic%20object%20detection&rft.jtitle=Pattern%20recognition%20letters&rft.au=Chen,%20Suting&rft.date=2021-12&rft.volume=152&rft.spage=302&rft.epage=310&rft.pages=302-310&rft.issn=0167-8655&rft.eissn=1872-7344&rft_id=info:doi/10.1016/j.patrec.2021.10.026&rft_dat=%3Cproquest_cross%3E2623463130%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2623463130&rft_id=info:pmid/&rft_els_id=S0167865521003858&rfr_iscdi=true