Delving into the Effectiveness of Receptive Fields: Learning Scale-Transferrable Architectures for Practical Object Detection

Scale-sensitive object detection remains a challenging task, where most of the existing methods could not learn it explicitly and are not robust. Besides, they are less efficient during training or slow during inference, which is not friendly to real-time applications. In this paper, we propose a sc...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:International journal of computer vision 2022-04, Vol.130 (4), p.970-989
Hauptverfasser: Zhang, Zhaoxiang, Pan, Cong, Peng, Junran
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 989
container_issue 4
container_start_page 970
container_title International journal of computer vision
container_volume 130
creator Zhang, Zhaoxiang
Pan, Cong
Peng, Junran
description Scale-sensitive object detection remains a challenging task, where most of the existing methods could not learn it explicitly and are not robust. Besides, they are less efficient during training or slow during inference, which is not friendly to real-time applications. In this paper, we propose a scale-transferrable architecture for practical object detection based on the analysis of the connection between dilation rate and effective receptive field. Our method firstly predicts a global continuous scale, which is shared by all positions, for each convolution filter of each network stage. Secondly, we average the spatial features and distill the scale from channels to effectively learn the scale. Thirdly, for fast-deployment, we propose a scale decomposition method that transfers the robust fractional scale into the combination of fixed integral scales for each convolution filter, which exploits the dilated convolution. Moreover, to overcome the shortcomings of our method for large-scale object detection, we modify the Feature Pyramid Network structure. Finally, we illustrate the orthogonality role of our method for sampling strategy. We demonstrate the effectiveness of our method on one-stage and two-stage algorithms under different configurations and compare them with different dilated convolution blocks. For practical applications, the training strategy of our method is simple and efficient, avoiding complex data sampling or optimization strategy. During inference, we reduce the latency of the proposed method by using the hardware accelerator TensorRT without extra operation. On the COCO test-dev , our model achieves 41.7% mAP on one-stage detector and 42.5% mAP on two-stage detector based on ResNet-101, and outperforms baselines by 3.2% and 3.1% mAP, respectively.
doi_str_mv 10.1007/s11263-021-01573-6
format Article
fullrecord <record><control><sourceid>gale_proqu</sourceid><recordid>TN_cdi_proquest_journals_2642698379</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A698233293</galeid><sourcerecordid>A698233293</sourcerecordid><originalsourceid>FETCH-LOGICAL-c343t-fcd0630c7c1fd1eb1d620f19c4332de8b9087af1637091655ac98708f891d5d83</originalsourceid><addsrcrecordid>eNp9kU1vGyEQhlHVSHXT_oGekHLKYdMBvOySm5WPNpKlVPk4I8wODtYGHMBRe8h_D85WqnKpOIBmnvedQS8h3xicMIDue2aMS9EAZw2wthON_EBmbw82h_YjmYHi0LRSsU_kc84bAOA9FzPyco7jsw9r6kOJtDwgvXAObfHPGDBnGh29QYvbfYFeehyHfEqXaFLYi26tGbG5SyZkhymZ1Yh0keyDL9VilzBTFxP9lUw1rCi9Xm1qg57jvu9j-EIOnBkzfv17H5L7y4u7s5_N8vrH1dli2VgxF6VxdgApwHaWuYHhig2Sg2PKzoXgA_YrBX1nHJOiA8Vk2xqr-g561ys2tEMvDsnR5LtN8WmHuehN3KVQR2ou51yqXnSqUicTta6_0j64WOrm9Qz46G0M6HytLyrN61wlquD4naAyBX-XtdnlrK9ub96zfGJtijkndHqb_KNJfzQDvc9QTxnqmqF-y1DLKhKTKFc4rDH92_s_qldKFZ67</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2642698379</pqid></control><display><type>article</type><title>Delving into the Effectiveness of Receptive Fields: Learning Scale-Transferrable Architectures for Practical Object Detection</title><source>SpringerLink Journals - AutoHoldings</source><creator>Zhang, Zhaoxiang ; Pan, Cong ; Peng, Junran</creator><creatorcontrib>Zhang, Zhaoxiang ; Pan, Cong ; Peng, Junran</creatorcontrib><description>Scale-sensitive object detection remains a challenging task, where most of the existing methods could not learn it explicitly and are not robust. Besides, they are less efficient during training or slow during inference, which is not friendly to real-time applications. In this paper, we propose a scale-transferrable architecture for practical object detection based on the analysis of the connection between dilation rate and effective receptive field. Our method firstly predicts a global continuous scale, which is shared by all positions, for each convolution filter of each network stage. Secondly, we average the spatial features and distill the scale from channels to effectively learn the scale. Thirdly, for fast-deployment, we propose a scale decomposition method that transfers the robust fractional scale into the combination of fixed integral scales for each convolution filter, which exploits the dilated convolution. Moreover, to overcome the shortcomings of our method for large-scale object detection, we modify the Feature Pyramid Network structure. Finally, we illustrate the orthogonality role of our method for sampling strategy. We demonstrate the effectiveness of our method on one-stage and two-stage algorithms under different configurations and compare them with different dilated convolution blocks. For practical applications, the training strategy of our method is simple and efficient, avoiding complex data sampling or optimization strategy. During inference, we reduce the latency of the proposed method by using the hardware accelerator TensorRT without extra operation. On the COCO test-dev , our model achieves 41.7% mAP on one-stage detector and 42.5% mAP on two-stage detector based on ResNet-101, and outperforms baselines by 3.2% and 3.1% mAP, respectively.</description><identifier>ISSN: 0920-5691</identifier><identifier>EISSN: 1573-1405</identifier><identifier>DOI: 10.1007/s11263-021-01573-6</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Accuracy ; Algorithms ; Analysis ; Artificial Intelligence ; Computer Imaging ; Computer Science ; Convolution ; Data sampling ; Decomposition ; Detectors ; Image Processing and Computer Vision ; Inference ; Learning ; Methods ; Network latency ; Neural networks ; Object recognition ; Optimization ; Orthogonality ; Pattern Recognition ; Pattern Recognition and Graphics ; Science ; Sensors ; Training ; Vision</subject><ispartof>International journal of computer vision, 2022-04, Vol.130 (4), p.970-989</ispartof><rights>The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2022</rights><rights>COPYRIGHT 2022 Springer</rights><rights>The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2022.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c343t-fcd0630c7c1fd1eb1d620f19c4332de8b9087af1637091655ac98708f891d5d83</cites><orcidid>0000-0001-5959-4294</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s11263-021-01573-6$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s11263-021-01573-6$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,776,780,27901,27902,41464,42533,51294</link.rule.ids></links><search><creatorcontrib>Zhang, Zhaoxiang</creatorcontrib><creatorcontrib>Pan, Cong</creatorcontrib><creatorcontrib>Peng, Junran</creatorcontrib><title>Delving into the Effectiveness of Receptive Fields: Learning Scale-Transferrable Architectures for Practical Object Detection</title><title>International journal of computer vision</title><addtitle>Int J Comput Vis</addtitle><description>Scale-sensitive object detection remains a challenging task, where most of the existing methods could not learn it explicitly and are not robust. Besides, they are less efficient during training or slow during inference, which is not friendly to real-time applications. In this paper, we propose a scale-transferrable architecture for practical object detection based on the analysis of the connection between dilation rate and effective receptive field. Our method firstly predicts a global continuous scale, which is shared by all positions, for each convolution filter of each network stage. Secondly, we average the spatial features and distill the scale from channels to effectively learn the scale. Thirdly, for fast-deployment, we propose a scale decomposition method that transfers the robust fractional scale into the combination of fixed integral scales for each convolution filter, which exploits the dilated convolution. Moreover, to overcome the shortcomings of our method for large-scale object detection, we modify the Feature Pyramid Network structure. Finally, we illustrate the orthogonality role of our method for sampling strategy. We demonstrate the effectiveness of our method on one-stage and two-stage algorithms under different configurations and compare them with different dilated convolution blocks. For practical applications, the training strategy of our method is simple and efficient, avoiding complex data sampling or optimization strategy. During inference, we reduce the latency of the proposed method by using the hardware accelerator TensorRT without extra operation. On the COCO test-dev , our model achieves 41.7% mAP on one-stage detector and 42.5% mAP on two-stage detector based on ResNet-101, and outperforms baselines by 3.2% and 3.1% mAP, respectively.</description><subject>Accuracy</subject><subject>Algorithms</subject><subject>Analysis</subject><subject>Artificial Intelligence</subject><subject>Computer Imaging</subject><subject>Computer Science</subject><subject>Convolution</subject><subject>Data sampling</subject><subject>Decomposition</subject><subject>Detectors</subject><subject>Image Processing and Computer Vision</subject><subject>Inference</subject><subject>Learning</subject><subject>Methods</subject><subject>Network latency</subject><subject>Neural networks</subject><subject>Object recognition</subject><subject>Optimization</subject><subject>Orthogonality</subject><subject>Pattern Recognition</subject><subject>Pattern Recognition and Graphics</subject><subject>Science</subject><subject>Sensors</subject><subject>Training</subject><subject>Vision</subject><issn>0920-5691</issn><issn>1573-1405</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNp9kU1vGyEQhlHVSHXT_oGekHLKYdMBvOySm5WPNpKlVPk4I8wODtYGHMBRe8h_D85WqnKpOIBmnvedQS8h3xicMIDue2aMS9EAZw2wthON_EBmbw82h_YjmYHi0LRSsU_kc84bAOA9FzPyco7jsw9r6kOJtDwgvXAObfHPGDBnGh29QYvbfYFeehyHfEqXaFLYi26tGbG5SyZkhymZ1Yh0keyDL9VilzBTFxP9lUw1rCi9Xm1qg57jvu9j-EIOnBkzfv17H5L7y4u7s5_N8vrH1dli2VgxF6VxdgApwHaWuYHhig2Sg2PKzoXgA_YrBX1nHJOiA8Vk2xqr-g561ys2tEMvDsnR5LtN8WmHuehN3KVQR2ou51yqXnSqUicTta6_0j64WOrm9Qz46G0M6HytLyrN61wlquD4naAyBX-XtdnlrK9ub96zfGJtijkndHqb_KNJfzQDvc9QTxnqmqF-y1DLKhKTKFc4rDH92_s_qldKFZ67</recordid><startdate>20220401</startdate><enddate>20220401</enddate><creator>Zhang, Zhaoxiang</creator><creator>Pan, Cong</creator><creator>Peng, Junran</creator><general>Springer US</general><general>Springer</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>ISR</scope><scope>3V.</scope><scope>7SC</scope><scope>7WY</scope><scope>7WZ</scope><scope>7XB</scope><scope>87Z</scope><scope>8AL</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>8FL</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BEZIV</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FRNLG</scope><scope>F~G</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K60</scope><scope>K6~</scope><scope>K7-</scope><scope>L.-</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>M0C</scope><scope>M0N</scope><scope>P5Z</scope><scope>P62</scope><scope>PQBIZ</scope><scope>PQBZA</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PYYUZ</scope><scope>Q9U</scope><orcidid>https://orcid.org/0000-0001-5959-4294</orcidid></search><sort><creationdate>20220401</creationdate><title>Delving into the Effectiveness of Receptive Fields: Learning Scale-Transferrable Architectures for Practical Object Detection</title><author>Zhang, Zhaoxiang ; Pan, Cong ; Peng, Junran</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c343t-fcd0630c7c1fd1eb1d620f19c4332de8b9087af1637091655ac98708f891d5d83</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Accuracy</topic><topic>Algorithms</topic><topic>Analysis</topic><topic>Artificial Intelligence</topic><topic>Computer Imaging</topic><topic>Computer Science</topic><topic>Convolution</topic><topic>Data sampling</topic><topic>Decomposition</topic><topic>Detectors</topic><topic>Image Processing and Computer Vision</topic><topic>Inference</topic><topic>Learning</topic><topic>Methods</topic><topic>Network latency</topic><topic>Neural networks</topic><topic>Object recognition</topic><topic>Optimization</topic><topic>Orthogonality</topic><topic>Pattern Recognition</topic><topic>Pattern Recognition and Graphics</topic><topic>Science</topic><topic>Sensors</topic><topic>Training</topic><topic>Vision</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zhang, Zhaoxiang</creatorcontrib><creatorcontrib>Pan, Cong</creatorcontrib><creatorcontrib>Peng, Junran</creatorcontrib><collection>CrossRef</collection><collection>Science In Context</collection><collection>ProQuest Central (Corporate)</collection><collection>Computer and Information Systems Abstracts</collection><collection>ABI/INFORM Collection</collection><collection>ABI/INFORM Global (PDF only)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>ABI/INFORM Collection</collection><collection>Computing Database (Alumni Edition)</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ABI/INFORM Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies &amp; Aerospace Database‎ (1962 - current)</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>ProQuest Business Premium Collection</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>Business Premium Collection (Alumni)</collection><collection>ABI/INFORM Global (Corporate)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Business Collection (Alumni Edition)</collection><collection>ProQuest Business Collection</collection><collection>Computer Science Database</collection><collection>ABI/INFORM Professional Advanced</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>ABI/INFORM Global</collection><collection>Computing Database</collection><collection>ProQuest advanced technologies &amp; aerospace journals</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>One Business</collection><collection>ProQuest One Business (Alumni)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ABI/INFORM Collection China</collection><collection>ProQuest Central Basic</collection><jtitle>International journal of computer vision</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Zhang, Zhaoxiang</au><au>Pan, Cong</au><au>Peng, Junran</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Delving into the Effectiveness of Receptive Fields: Learning Scale-Transferrable Architectures for Practical Object Detection</atitle><jtitle>International journal of computer vision</jtitle><stitle>Int J Comput Vis</stitle><date>2022-04-01</date><risdate>2022</risdate><volume>130</volume><issue>4</issue><spage>970</spage><epage>989</epage><pages>970-989</pages><issn>0920-5691</issn><eissn>1573-1405</eissn><abstract>Scale-sensitive object detection remains a challenging task, where most of the existing methods could not learn it explicitly and are not robust. Besides, they are less efficient during training or slow during inference, which is not friendly to real-time applications. In this paper, we propose a scale-transferrable architecture for practical object detection based on the analysis of the connection between dilation rate and effective receptive field. Our method firstly predicts a global continuous scale, which is shared by all positions, for each convolution filter of each network stage. Secondly, we average the spatial features and distill the scale from channels to effectively learn the scale. Thirdly, for fast-deployment, we propose a scale decomposition method that transfers the robust fractional scale into the combination of fixed integral scales for each convolution filter, which exploits the dilated convolution. Moreover, to overcome the shortcomings of our method for large-scale object detection, we modify the Feature Pyramid Network structure. Finally, we illustrate the orthogonality role of our method for sampling strategy. We demonstrate the effectiveness of our method on one-stage and two-stage algorithms under different configurations and compare them with different dilated convolution blocks. For practical applications, the training strategy of our method is simple and efficient, avoiding complex data sampling or optimization strategy. During inference, we reduce the latency of the proposed method by using the hardware accelerator TensorRT without extra operation. On the COCO test-dev , our model achieves 41.7% mAP on one-stage detector and 42.5% mAP on two-stage detector based on ResNet-101, and outperforms baselines by 3.2% and 3.1% mAP, respectively.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s11263-021-01573-6</doi><tpages>20</tpages><orcidid>https://orcid.org/0000-0001-5959-4294</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 0920-5691
ispartof International journal of computer vision, 2022-04, Vol.130 (4), p.970-989
issn 0920-5691
1573-1405
language eng
recordid cdi_proquest_journals_2642698379
source SpringerLink Journals - AutoHoldings
subjects Accuracy
Algorithms
Analysis
Artificial Intelligence
Computer Imaging
Computer Science
Convolution
Data sampling
Decomposition
Detectors
Image Processing and Computer Vision
Inference
Learning
Methods
Network latency
Neural networks
Object recognition
Optimization
Orthogonality
Pattern Recognition
Pattern Recognition and Graphics
Science
Sensors
Training
Vision
title Delving into the Effectiveness of Receptive Fields: Learning Scale-Transferrable Architectures for Practical Object Detection
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-07T16%3A28%3A43IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_proqu&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Delving%20into%20the%20Effectiveness%20of%20Receptive%20Fields:%20Learning%20Scale-Transferrable%20Architectures%20for%20Practical%20Object%20Detection&rft.jtitle=International%20journal%20of%20computer%20vision&rft.au=Zhang,%20Zhaoxiang&rft.date=2022-04-01&rft.volume=130&rft.issue=4&rft.spage=970&rft.epage=989&rft.pages=970-989&rft.issn=0920-5691&rft.eissn=1573-1405&rft_id=info:doi/10.1007/s11263-021-01573-6&rft_dat=%3Cgale_proqu%3EA698233293%3C/gale_proqu%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2642698379&rft_id=info:pmid/&rft_galeid=A698233293&rfr_iscdi=true