SnipeDet: Attention-guided pyramidal prediction kernels for generic object detection
•APPK module is proposed to tackle misalignments of different scales of objects.•IoU-adaptive loss function helps networks to deal with the hard negative samples.•SORR module is devised to improve the detection efficiency.•Interleaved subsampling method can enhance feature representations. Using sin...
Gespeichert in:
Veröffentlicht in: | Pattern recognition letters 2021-12, Vol.152, p.302-310 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 310 |
---|---|
container_issue | |
container_start_page | 302 |
container_title | Pattern recognition letters |
container_volume | 152 |
creator | Chen, Suting Cheng, Zehua Zhang, Liangchen Zheng, Yujie |
description | •APPK module is proposed to tackle misalignments of different scales of objects.•IoU-adaptive loss function helps networks to deal with the hard negative samples.•SORR module is devised to improve the detection efficiency.•Interleaved subsampling method can enhance feature representations.
Using single-scale prediction kernels or Region of Interest (RoI) pooling in the prediction modules of modern object detectors is not very successful in matching different scales of objects. State-of-the-art detectors with the feature pyramid structure built on different resolutions of feature maps can help alleviate this problem. Although with this structure, single-scale prediction kernels or RoI pooling still struggles to detect small objects, and simultaneously, the former continues to encounter the misalignment problem on very large objects. In this paper, we propose the attention-guided pyramidal prediction kernels module with a customized IoU-adaptive loss function to deal with the misalignment problem between the prediction module and different scales of objects. To mitigate the effect of heavy detection head, we also introduce the salient object regions recognition module to identify these regions that have strong object cues. Additionally, interleaved subsampling, as the proposed feature enhancement approach, is used to generate highly discriminative feature representations. We refer to the detection framework constituted by these proposed methods as SnipeDet. Results show that SnipeDet achieves 41.1 AP at the speed of 15.4 FPS on the MS COCO test-dev set with 512 × 512 input images, which outperforms state-of-the-art one-stage detectors and has a better trade-off between speed and accuracy. |
doi_str_mv | 10.1016/j.patrec.2021.10.026 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2623463130</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0167865521003858</els_id><sourcerecordid>2623463130</sourcerecordid><originalsourceid>FETCH-LOGICAL-c334t-d30285bb199074eea9436cff5f7958a7c1f2553a350b7258a27adf35fba2af7b3</originalsourceid><addsrcrecordid>eNp9kE1LxDAQhoMouK7-Aw8Bz635aJrWg7Csn7DgwfUc0mSypO62Nc0K--9NXc-eBt55ZoZ5ELqmJKeElrdtPugYwOSMMJqinLDyBM1oJVkmeVGcolnCZFaVQpyji3FsCSElr6sZWr93foAHiHd4ESN00fddttl7CxYPh6B33uotHgJYb6Ye_oTQwXbErg94Ax0Eb3DftGAithDhF7pEZ05vR7j6q3P08fS4Xr5kq7fn1-VilRnOi5hZTlglmobWNZEFgK4LXhrnhJO1qLQ01DEhuOaCNJKlhEltHReu0Uw72fA5ujnuHUL_tYcxqrbfhy6dVKxkvCg55SRRxZEyoR_HAE4Nwe90OChK1ORPteroT03-pjT5S2P3x7H0LXx7CGo0HjqTTCQ0Ktv7_xf8APVDe8I</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2623463130</pqid></control><display><type>article</type><title>SnipeDet: Attention-guided pyramidal prediction kernels for generic object detection</title><source>Elsevier ScienceDirect Journals</source><creator>Chen, Suting ; Cheng, Zehua ; Zhang, Liangchen ; Zheng, Yujie</creator><creatorcontrib>Chen, Suting ; Cheng, Zehua ; Zhang, Liangchen ; Zheng, Yujie</creatorcontrib><description>•APPK module is proposed to tackle misalignments of different scales of objects.•IoU-adaptive loss function helps networks to deal with the hard negative samples.•SORR module is devised to improve the detection efficiency.•Interleaved subsampling method can enhance feature representations.
Using single-scale prediction kernels or Region of Interest (RoI) pooling in the prediction modules of modern object detectors is not very successful in matching different scales of objects. State-of-the-art detectors with the feature pyramid structure built on different resolutions of feature maps can help alleviate this problem. Although with this structure, single-scale prediction kernels or RoI pooling still struggles to detect small objects, and simultaneously, the former continues to encounter the misalignment problem on very large objects. In this paper, we propose the attention-guided pyramidal prediction kernels module with a customized IoU-adaptive loss function to deal with the misalignment problem between the prediction module and different scales of objects. To mitigate the effect of heavy detection head, we also introduce the salient object regions recognition module to identify these regions that have strong object cues. Additionally, interleaved subsampling, as the proposed feature enhancement approach, is used to generate highly discriminative feature representations. We refer to the detection framework constituted by these proposed methods as SnipeDet. Results show that SnipeDet achieves 41.1 AP at the speed of 15.4 FPS on the MS COCO test-dev set with 512 × 512 input images, which outperforms state-of-the-art one-stage detectors and has a better trade-off between speed and accuracy.</description><identifier>ISSN: 0167-8655</identifier><identifier>EISSN: 1872-7344</identifier><identifier>DOI: 10.1016/j.patrec.2021.10.026</identifier><language>eng</language><publisher>Amsterdam: Elsevier B.V</publisher><subject>Attention mechanism ; Detectors ; Feature enhancement ; Feature maps ; Hard negative mining ; Kernels ; Misalignment ; Modules ; Object detection ; Object recognition ; Prediction module ; Predictions ; Salience ; Sensors</subject><ispartof>Pattern recognition letters, 2021-12, Vol.152, p.302-310</ispartof><rights>2021 Elsevier B.V.</rights><rights>Copyright Elsevier Science Ltd. Dec 2021</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c334t-d30285bb199074eea9436cff5f7958a7c1f2553a350b7258a27adf35fba2af7b3</citedby><cites>FETCH-LOGICAL-c334t-d30285bb199074eea9436cff5f7958a7c1f2553a350b7258a27adf35fba2af7b3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.sciencedirect.com/science/article/pii/S0167865521003858$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,776,780,3537,27901,27902,65306</link.rule.ids></links><search><creatorcontrib>Chen, Suting</creatorcontrib><creatorcontrib>Cheng, Zehua</creatorcontrib><creatorcontrib>Zhang, Liangchen</creatorcontrib><creatorcontrib>Zheng, Yujie</creatorcontrib><title>SnipeDet: Attention-guided pyramidal prediction kernels for generic object detection</title><title>Pattern recognition letters</title><description>•APPK module is proposed to tackle misalignments of different scales of objects.•IoU-adaptive loss function helps networks to deal with the hard negative samples.•SORR module is devised to improve the detection efficiency.•Interleaved subsampling method can enhance feature representations.
Using single-scale prediction kernels or Region of Interest (RoI) pooling in the prediction modules of modern object detectors is not very successful in matching different scales of objects. State-of-the-art detectors with the feature pyramid structure built on different resolutions of feature maps can help alleviate this problem. Although with this structure, single-scale prediction kernels or RoI pooling still struggles to detect small objects, and simultaneously, the former continues to encounter the misalignment problem on very large objects. In this paper, we propose the attention-guided pyramidal prediction kernels module with a customized IoU-adaptive loss function to deal with the misalignment problem between the prediction module and different scales of objects. To mitigate the effect of heavy detection head, we also introduce the salient object regions recognition module to identify these regions that have strong object cues. Additionally, interleaved subsampling, as the proposed feature enhancement approach, is used to generate highly discriminative feature representations. We refer to the detection framework constituted by these proposed methods as SnipeDet. Results show that SnipeDet achieves 41.1 AP at the speed of 15.4 FPS on the MS COCO test-dev set with 512 × 512 input images, which outperforms state-of-the-art one-stage detectors and has a better trade-off between speed and accuracy.</description><subject>Attention mechanism</subject><subject>Detectors</subject><subject>Feature enhancement</subject><subject>Feature maps</subject><subject>Hard negative mining</subject><subject>Kernels</subject><subject>Misalignment</subject><subject>Modules</subject><subject>Object detection</subject><subject>Object recognition</subject><subject>Prediction module</subject><subject>Predictions</subject><subject>Salience</subject><subject>Sensors</subject><issn>0167-8655</issn><issn>1872-7344</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><recordid>eNp9kE1LxDAQhoMouK7-Aw8Bz635aJrWg7Csn7DgwfUc0mSypO62Nc0K--9NXc-eBt55ZoZ5ELqmJKeElrdtPugYwOSMMJqinLDyBM1oJVkmeVGcolnCZFaVQpyji3FsCSElr6sZWr93foAHiHd4ESN00fddttl7CxYPh6B33uotHgJYb6Ye_oTQwXbErg94Ax0Eb3DftGAithDhF7pEZ05vR7j6q3P08fS4Xr5kq7fn1-VilRnOi5hZTlglmobWNZEFgK4LXhrnhJO1qLQ01DEhuOaCNJKlhEltHReu0Uw72fA5ujnuHUL_tYcxqrbfhy6dVKxkvCg55SRRxZEyoR_HAE4Nwe90OChK1ORPteroT03-pjT5S2P3x7H0LXx7CGo0HjqTTCQ0Ktv7_xf8APVDe8I</recordid><startdate>202112</startdate><enddate>202112</enddate><creator>Chen, Suting</creator><creator>Cheng, Zehua</creator><creator>Zhang, Liangchen</creator><creator>Zheng, Yujie</creator><general>Elsevier B.V</general><general>Elsevier Science Ltd</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7TK</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>202112</creationdate><title>SnipeDet: Attention-guided pyramidal prediction kernels for generic object detection</title><author>Chen, Suting ; Cheng, Zehua ; Zhang, Liangchen ; Zheng, Yujie</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c334t-d30285bb199074eea9436cff5f7958a7c1f2553a350b7258a27adf35fba2af7b3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Attention mechanism</topic><topic>Detectors</topic><topic>Feature enhancement</topic><topic>Feature maps</topic><topic>Hard negative mining</topic><topic>Kernels</topic><topic>Misalignment</topic><topic>Modules</topic><topic>Object detection</topic><topic>Object recognition</topic><topic>Prediction module</topic><topic>Predictions</topic><topic>Salience</topic><topic>Sensors</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Chen, Suting</creatorcontrib><creatorcontrib>Cheng, Zehua</creatorcontrib><creatorcontrib>Zhang, Liangchen</creatorcontrib><creatorcontrib>Zheng, Yujie</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Neurosciences Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Pattern recognition letters</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Chen, Suting</au><au>Cheng, Zehua</au><au>Zhang, Liangchen</au><au>Zheng, Yujie</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>SnipeDet: Attention-guided pyramidal prediction kernels for generic object detection</atitle><jtitle>Pattern recognition letters</jtitle><date>2021-12</date><risdate>2021</risdate><volume>152</volume><spage>302</spage><epage>310</epage><pages>302-310</pages><issn>0167-8655</issn><eissn>1872-7344</eissn><abstract>•APPK module is proposed to tackle misalignments of different scales of objects.•IoU-adaptive loss function helps networks to deal with the hard negative samples.•SORR module is devised to improve the detection efficiency.•Interleaved subsampling method can enhance feature representations.
Using single-scale prediction kernels or Region of Interest (RoI) pooling in the prediction modules of modern object detectors is not very successful in matching different scales of objects. State-of-the-art detectors with the feature pyramid structure built on different resolutions of feature maps can help alleviate this problem. Although with this structure, single-scale prediction kernels or RoI pooling still struggles to detect small objects, and simultaneously, the former continues to encounter the misalignment problem on very large objects. In this paper, we propose the attention-guided pyramidal prediction kernels module with a customized IoU-adaptive loss function to deal with the misalignment problem between the prediction module and different scales of objects. To mitigate the effect of heavy detection head, we also introduce the salient object regions recognition module to identify these regions that have strong object cues. Additionally, interleaved subsampling, as the proposed feature enhancement approach, is used to generate highly discriminative feature representations. We refer to the detection framework constituted by these proposed methods as SnipeDet. Results show that SnipeDet achieves 41.1 AP at the speed of 15.4 FPS on the MS COCO test-dev set with 512 × 512 input images, which outperforms state-of-the-art one-stage detectors and has a better trade-off between speed and accuracy.</abstract><cop>Amsterdam</cop><pub>Elsevier B.V</pub><doi>10.1016/j.patrec.2021.10.026</doi><tpages>9</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0167-8655 |
ispartof | Pattern recognition letters, 2021-12, Vol.152, p.302-310 |
issn | 0167-8655 1872-7344 |
language | eng |
recordid | cdi_proquest_journals_2623463130 |
source | Elsevier ScienceDirect Journals |
subjects | Attention mechanism Detectors Feature enhancement Feature maps Hard negative mining Kernels Misalignment Modules Object detection Object recognition Prediction module Predictions Salience Sensors |
title | SnipeDet: Attention-guided pyramidal prediction kernels for generic object detection |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-04T20%3A27%3A00IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=SnipeDet:%20Attention-guided%20pyramidal%20prediction%20kernels%20for%20generic%20object%20detection&rft.jtitle=Pattern%20recognition%20letters&rft.au=Chen,%20Suting&rft.date=2021-12&rft.volume=152&rft.spage=302&rft.epage=310&rft.pages=302-310&rft.issn=0167-8655&rft.eissn=1872-7344&rft_id=info:doi/10.1016/j.patrec.2021.10.026&rft_dat=%3Cproquest_cross%3E2623463130%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2623463130&rft_id=info:pmid/&rft_els_id=S0167865521003858&rfr_iscdi=true |