Delving into the Effectiveness of Receptive Fields: Learning Scale-Transferrable Architectures for Practical Object Detection

Scale-sensitive object detection remains a challenging task, where most of the existing methods could not learn it explicitly and are not robust. Besides, they are less efficient during training or slow during inference, which is not friendly to real-time applications. In this paper, we propose a sc...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	International journal of computer vision 2022-04, Vol.130 (4), p.970-989
Hauptverfasser:	Zhang, Zhaoxiang, Pan, Cong, Peng, Junran
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Algorithms Analysis Artificial Intelligence Computer Imaging Computer Science Convolution Data sampling Decomposition Detectors Image Processing and Computer Vision Inference Learning Methods Network latency Neural networks Object recognition Optimization Orthogonality Pattern Recognition Pattern Recognition and Graphics Science Sensors Training Vision
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	989
container_issue	4
container_start_page	970
container_title	International journal of computer vision
container_volume	130
creator	Zhang, Zhaoxiang Pan, Cong Peng, Junran
description	Scale-sensitive object detection remains a challenging task, where most of the existing methods could not learn it explicitly and are not robust. Besides, they are less efficient during training or slow during inference, which is not friendly to real-time applications. In this paper, we propose a scale-transferrable architecture for practical object detection based on the analysis of the connection between dilation rate and effective receptive field. Our method firstly predicts a global continuous scale, which is shared by all positions, for each convolution filter of each network stage. Secondly, we average the spatial features and distill the scale from channels to effectively learn the scale. Thirdly, for fast-deployment, we propose a scale decomposition method that transfers the robust fractional scale into the combination of fixed integral scales for each convolution filter, which exploits the dilated convolution. Moreover, to overcome the shortcomings of our method for large-scale object detection, we modify the Feature Pyramid Network structure. Finally, we illustrate the orthogonality role of our method for sampling strategy. We demonstrate the effectiveness of our method on one-stage and two-stage algorithms under different configurations and compare them with different dilated convolution blocks. For practical applications, the training strategy of our method is simple and efficient, avoiding complex data sampling or optimization strategy. During inference, we reduce the latency of the proposed method by using the hardware accelerator TensorRT without extra operation. On the COCO test-dev , our model achieves 41.7% mAP on one-stage detector and 42.5% mAP on two-stage detector based on ResNet-101, and outperforms baselines by 3.2% and 3.1% mAP, respectively.
doi_str_mv	10.1007/s11263-021-01573-6
format	Article
fullrecord	<record><control><sourceid>gale_proqu</sourceid><recordid>TN_cdi_proquest_journals_2642698379</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A698233293</galeid><sourcerecordid>A698233293</sourcerecordid><originalsourceid>FETCH-LOGICAL-c343t-fcd0630c7c1fd1eb1d620f19c4332de8b9087af1637091655ac98708f891d5d83</originalsourceid><addsrcrecordid>eNp9kU1vGyEQhlHVSHXT_oGekHLKYdMBvOySm5WPNpKlVPk4I8wODtYGHMBRe8h_D85WqnKpOIBmnvedQS8h3xicMIDue2aMS9EAZw2wthON_EBmbw82h_YjmYHi0LRSsU_kc84bAOA9FzPyco7jsw9r6kOJtDwgvXAObfHPGDBnGh29QYvbfYFeehyHfEqXaFLYi26tGbG5SyZkhymZ1Yh0keyDL9VilzBTFxP9lUw1rCi9Xm1qg57jvu9j-EIOnBkzfv17H5L7y4u7s5_N8vrH1dli2VgxF6VxdgApwHaWuYHhig2Sg2PKzoXgA_YrBX1nHJOiA8Vk2xqr-g561ys2tEMvDsnR5LtN8WmHuehN3KVQR2ou51yqXnSqUicTta6_0j64WOrm9Qz46G0M6HytLyrN61wlquD4naAyBX-XtdnlrK9ub96zfGJtijkndHqb_KNJfzQDvc9QTxnqmqF-y1DLKhKTKFc4rDH92_s_qldKFZ67</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2642698379</pqid></control><display><type>article</type><title>Delving into the Effectiveness of Receptive Fields: Learning Scale-Transferrable Architectures for Practical Object Detection</title><source>SpringerLink Journals - AutoHoldings</source><creator>Zhang, Zhaoxiang ; Pan, Cong ; Peng, Junran</creator><creatorcontrib>Zhang, Zhaoxiang ; Pan, Cong ; Peng, Junran</creatorcontrib><description>Scale-sensitive object detection remains a challenging task, where most of the existing methods could not learn it explicitly and are not robust. Besides, they are less efficient during training or slow during inference, which is not friendly to real-time applications. In this paper, we propose a scale-transferrable architecture for practical object detection based on the analysis of the connection between dilation rate and effective receptive field. Our method firstly predicts a global continuous scale, which is shared by all positions, for each convolution filter of each network stage. Secondly, we average the spatial features and distill the scale from channels to effectively learn the scale. Thirdly, for fast-deployment, we propose a scale decomposition method that transfers the robust fractional scale into the combination of fixed integral scales for each convolution filter, which exploits the dilated convolution. Moreover, to overcome the shortcomings of our method for large-scale object detection, we modify the Feature Pyramid Network structure. Finally, we illustrate the orthogonality role of our method for sampling strategy. We demonstrate the effectiveness of our method on one-stage and two-stage algorithms under different configurations and compare them with different dilated convolution blocks. For practical applications, the training strategy of our method is simple and efficient, avoiding complex data sampling or optimization strategy. During inference, we reduce the latency of the proposed method by using the hardware accelerator TensorRT without extra operation. On the COCO test-dev , our model achieves 41.7% mAP on one-stage detector and 42.5% mAP on two-stage detector based on ResNet-101, and outperforms baselines by 3.2% and 3.1% mAP, respectively.</description><identifier>ISSN: 0920-5691</identifier><identifier>EISSN: 1573-1405</identifier><identifier>DOI: 10.1007/s11263-021-01573-6</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Accuracy ; Algorithms ; Analysis ; Artificial Intelligence ; Computer Imaging ; Computer Science ; Convolution ; Data sampling ; Decomposition ; Detectors ; Image Processing and Computer Vision ; Inference ; Learning ; Methods ; Network latency ; Neural networks ; Object recognition ; Optimization ; Orthogonality ; Pattern Recognition ; Pattern Recognition and Graphics ; Science ; Sensors ; Training ; Vision</subject><ispartof>International journal of computer vision, 2022-04, Vol.130 (4), p.970-989</ispartof><rights>The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2022</rights><rights>COPYRIGHT 2022 Springer</rights><rights>The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2022.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c343t-fcd0630c7c1fd1eb1d620f19c4332de8b9087af1637091655ac98708f891d5d83</cites><orcidid>0000-0001-5959-4294</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s11263-021-01573-6$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s11263-021-01573-6$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,776,780,27901,27902,41464,42533,51294</link.rule.ids></links><search><creatorcontrib>Zhang, Zhaoxiang</creatorcontrib><creatorcontrib>Pan, Cong</creatorcontrib><creatorcontrib>Peng, Junran</creatorcontrib><title>Delving into the Effectiveness of Receptive Fields: Learning Scale-Transferrable Architectures for Practical Object Detection</title><title>International journal of computer vision</title><addtitle>Int J Comput Vis</addtitle><description>Scale-sensitive object detection remains a challenging task, where most of the existing methods could not learn it explicitly and are not robust. Besides, they are less efficient during training or slow during inference, which is not friendly to real-time applications. In this paper, we propose a scale-transferrable architecture for practical object detection based on the analysis of the connection between dilation rate and effective receptive field. Our method firstly predicts a global continuous scale, which is shared by all positions, for each convolution filter of each network stage. Secondly, we average the spatial features and distill the scale from channels to effectively learn the scale. Thirdly, for fast-deployment, we propose a scale decomposition method that transfers the robust fractional scale into the combination of fixed integral scales for each convolution filter, which exploits the dilated convolution. Moreover, to overcome the shortcomings of our method for large-scale object detection, we modify the Feature Pyramid Network structure. Finally, we illustrate the orthogonality role of our method for sampling strategy. We demonstrate the effectiveness of our method on one-stage and two-stage algorithms under different configurations and compare them with different dilated convolution blocks. For practical applications, the training strategy of our method is simple and efficient, avoiding complex data sampling or optimization strategy. During inference, we reduce the latency of the proposed method by using the hardware accelerator TensorRT without extra operation. On the COCO test-dev , our model achieves 41.7% mAP on one-stage detector and 42.5% mAP on two-stage detector based on ResNet-101, and outperforms baselines by 3.2% and 3.1% mAP, respectively.</description><subject>Accuracy</subject><subject>Algorithms</subject><subject>Analysis</subject><subject>Artificial Intelligence</subject><subject>Computer Imaging</subject><subject>Computer Science</subject><subject>Convolution</subject><subject>Data sampling</subject><subject>Decomposition</subject><subject>Detectors</subject><subject>Image Processing and Computer Vision</subject><subject>Inference</subject><subject>Learning</subject><subject>Methods</subject><subject>Network latency</subject><subject>Neural networks</subject><subject>Object recognition</subject><subject>Optimization</subject><subject>Orthogonality</subject><subject>Pattern Recognition</subject><subject>Pattern Recognition and Graphics</subject><subject>Science</subject><subject>Sensors</subject><subject>Training</subject><subject>Vision</subject><issn>0920-5691</issn><issn>1573-1405</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNp9kU1vGyEQhlHVSHXT_oGekHLKYdMBvOySm5WPNpKlVPk4I8wODtYGHMBRe8h_D85WqnKpOIBmnvedQS8h3xicMIDue2aMS9EAZw2wthON_EBmbw82h_YjmYHi0LRSsU_kc84bAOA9FzPyco7jsw9r6kOJtDwgvXAObfHPGDBnGh29QYvbfYFeehyHfEqXaFLYi26tGbG5SyZkhymZ1Yh0keyDL9VilzBTFxP9lUw1rCi9Xm1qg57jvu9j-EIOnBkzfv17H5L7y4u7s5_N8vrH1dli2VgxF6VxdgApwHaWuYHhig2Sg2PKzoXgA_YrBX1nHJOiA8Vk2xqr-g561ys2tEMvDsnR5LtN8WmHuehN3KVQR2ou51yqXnSqUicTta6_0j64WOrm9Qz46G0M6HytLyrN61wlquD4naAyBX-XtdnlrK9ub96zfGJtijkndHqb_KNJfzQDvc9QTxnqmqF-y1DLKhKTKFc4rDH92_s_qldKFZ67</recordid><startdate>20220401</startdate><enddate>20220401</enddate><creator>Zhang, Zhaoxiang</creator><creator>Pan, Cong</creator><creator>Peng, Junran</creator><general>Springer US</general><general>Springer</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>ISR</scope><scope>3V.</scope><scope>7SC</scope><scope>7WY</scope><scope>7WZ</scope><scope>7XB</scope><scope>87Z</scope><scope>8AL</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>8FL</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BEZIV</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FRNLG</scope><scope>F~G</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K60</scope><scope>K6~</scope><scope>K7-</scope><scope>L.-</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>M0C</scope><scope>M0N</scope><scope>P5Z</scope><scope>P62</scope><scope>PQBIZ</scope><scope>PQBZA</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PYYUZ</scope><scope>Q9U</scope><orcidid>https://orcid.org/0000-0001-5959-4294</orcidid></search><sort><creationdate>20220401</creationdate><title>Delving into the Effectiveness of Receptive Fields: Learning Scale-Transferrable Architectures for Practical Object Detection</title><author>Zhang, Zhaoxiang ; Pan, Cong ; Peng, Junran</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c343t-fcd0630c7c1fd1eb1d620f19c4332de8b9087af1637091655ac98708f891d5d83</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Accuracy</topic><topic>Algorithms</topic><topic>Analysis</topic><topic>Artificial Intelligence</topic><topic>Computer Imaging</topic><topic>Computer Science</topic><topic>Convolution</topic><topic>Data sampling</topic><topic>Decomposition</topic><topic>Detectors</topic><topic>Image Processing and Computer Vision</topic><topic>Inference</topic><topic>Learning</topic><topic>Methods</topic><topic>Network latency</topic><topic>Neural networks</topic><topic>Object recognition</topic><topic>Optimization</topic><topic>Orthogonality</topic><topic>Pattern Recognition</topic><topic>Pattern Recognition and Graphics</topic><topic>Science</topic><topic>Sensors</topic><topic>Training</topic><topic>Vision</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zhang, Zhaoxiang</creatorcontrib><creatorcontrib>Pan, Cong</creatorcontrib><creatorcontrib>Peng, Junran</creatorcontrib><collection>CrossRef</collection><collection>Science In Context</collection><collection>ProQuest Central (Corporate)</collection><collection>Computer and Information Systems Abstracts</collection><collection>ABI/INFORM Collection</collection><collection>ABI/INFORM Global (PDF only)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>ABI/INFORM Collection</collection><collection>Computing Database (Alumni Edition)</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ABI/INFORM Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Database‎ (1962 - current)</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>ProQuest Business Premium Collection</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>Business Premium Collection (Alumni)</collection><collection>ABI/INFORM Global (Corporate)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Business Collection (Alumni Edition)</collection><collection>ProQuest Business Collection</collection><collection>Computer Science Database</collection><collection>ABI/INFORM Professional Advanced</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>ABI/INFORM Global</collection><collection>Computing Database</collection><collection>ProQuest advanced technologies & aerospace journals</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>One Business</collection><collection>ProQuest One Business (Alumni)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ABI/INFORM Collection China</collection><collection>ProQuest Central Basic</collection><jtitle>International journal of computer vision</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Zhang, Zhaoxiang</au><au>Pan, Cong</au><au>Peng, Junran</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Delving into the Effectiveness of Receptive Fields: Learning Scale-Transferrable Architectures for Practical Object Detection</atitle><jtitle>International journal of computer vision</jtitle><stitle>Int J Comput Vis</stitle><date>2022-04-01</date><risdate>2022</risdate><volume>130</volume><issue>4</issue><spage>970</spage><epage>989</epage><pages>970-989</pages><issn>0920-5691</issn><eissn>1573-1405</eissn><abstract>Scale-sensitive object detection remains a challenging task, where most of the existing methods could not learn it explicitly and are not robust. Besides, they are less efficient during training or slow during inference, which is not friendly to real-time applications. In this paper, we propose a scale-transferrable architecture for practical object detection based on the analysis of the connection between dilation rate and effective receptive field. Our method firstly predicts a global continuous scale, which is shared by all positions, for each convolution filter of each network stage. Secondly, we average the spatial features and distill the scale from channels to effectively learn the scale. Thirdly, for fast-deployment, we propose a scale decomposition method that transfers the robust fractional scale into the combination of fixed integral scales for each convolution filter, which exploits the dilated convolution. Moreover, to overcome the shortcomings of our method for large-scale object detection, we modify the Feature Pyramid Network structure. Finally, we illustrate the orthogonality role of our method for sampling strategy. We demonstrate the effectiveness of our method on one-stage and two-stage algorithms under different configurations and compare them with different dilated convolution blocks. For practical applications, the training strategy of our method is simple and efficient, avoiding complex data sampling or optimization strategy. During inference, we reduce the latency of the proposed method by using the hardware accelerator TensorRT without extra operation. On the COCO test-dev , our model achieves 41.7% mAP on one-stage detector and 42.5% mAP on two-stage detector based on ResNet-101, and outperforms baselines by 3.2% and 3.1% mAP, respectively.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s11263-021-01573-6</doi><tpages>20</tpages><orcidid>https://orcid.org/0000-0001-5959-4294</orcidid></addata></record>
fulltext	fulltext
identifier	ISSN: 0920-5691
ispartof	International journal of computer vision, 2022-04, Vol.130 (4), p.970-989
issn	0920-5691 1573-1405
language	eng
recordid	cdi_proquest_journals_2642698379
source	SpringerLink Journals - AutoHoldings
subjects	Accuracy Algorithms Analysis Artificial Intelligence Computer Imaging Computer Science Convolution Data sampling Decomposition Detectors Image Processing and Computer Vision Inference Learning Methods Network latency Neural networks Object recognition Optimization Orthogonality Pattern Recognition Pattern Recognition and Graphics Science Sensors Training Vision
title	Delving into the Effectiveness of Receptive Fields: Learning Scale-Transferrable Architectures for Practical Object Detection
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-07T16%3A28%3A43IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_proqu&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Delving%20into%20the%20Effectiveness%20of%20Receptive%20Fields:%20Learning%20Scale-Transferrable%20Architectures%20for%20Practical%20Object%20Detection&rft.jtitle=International%20journal%20of%20computer%20vision&rft.au=Zhang,%20Zhaoxiang&rft.date=2022-04-01&rft.volume=130&rft.issue=4&rft.spage=970&rft.epage=989&rft.pages=970-989&rft.issn=0920-5691&rft.eissn=1573-1405&rft_id=info:doi/10.1007/s11263-021-01573-6&rft_dat=%3Cgale_proqu%3EA698233293%3C/gale_proqu%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2642698379&rft_id=info:pmid/&rft_galeid=A698233293&rfr_iscdi=true