Delving into the Effectiveness of Receptive Fields: Learning Scale-Transferrable Architectures for Practical Object Detection
Scale-sensitive object detection remains a challenging task, where most of the existing methods could not learn it explicitly and are not robust. Besides, they are less efficient during training or slow during inference, which is not friendly to real-time applications. In this paper, we propose a sc...
Gespeichert in:
Veröffentlicht in: | International journal of computer vision 2022-04, Vol.130 (4), p.970-989 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 989 |
---|---|
container_issue | 4 |
container_start_page | 970 |
container_title | International journal of computer vision |
container_volume | 130 |
creator | Zhang, Zhaoxiang Pan, Cong Peng, Junran |
description | Scale-sensitive object detection remains a challenging task, where most of the existing methods could not learn it explicitly and are not robust. Besides, they are less efficient during training or slow during inference, which is not friendly to real-time applications. In this paper, we propose a scale-transferrable architecture for practical object detection based on the analysis of the connection between dilation rate and effective receptive field. Our method firstly predicts a global continuous scale, which is shared by all positions, for each convolution filter of each network stage. Secondly, we average the spatial features and distill the scale from channels to effectively learn the scale. Thirdly, for fast-deployment, we propose a scale decomposition method that transfers the robust fractional scale into the combination of fixed integral scales for each convolution filter, which exploits the dilated convolution. Moreover, to overcome the shortcomings of our method for large-scale object detection, we modify the Feature Pyramid Network structure. Finally, we illustrate the orthogonality role of our method for sampling strategy. We demonstrate the effectiveness of our method on one-stage and two-stage algorithms under different configurations and compare them with different dilated convolution blocks. For practical applications, the training strategy of our method is simple and efficient, avoiding complex data sampling or optimization strategy. During inference, we reduce the latency of the proposed method by using the hardware accelerator TensorRT without extra operation. On the COCO
test-dev
, our model achieves 41.7% mAP on one-stage detector and 42.5% mAP on two-stage detector based on ResNet-101, and outperforms baselines by 3.2% and 3.1% mAP, respectively. |
doi_str_mv | 10.1007/s11263-021-01573-6 |
format | Article |
fullrecord | <record><control><sourceid>gale_proqu</sourceid><recordid>TN_cdi_proquest_journals_2642698379</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A698233293</galeid><sourcerecordid>A698233293</sourcerecordid><originalsourceid>FETCH-LOGICAL-c343t-fcd0630c7c1fd1eb1d620f19c4332de8b9087af1637091655ac98708f891d5d83</originalsourceid><addsrcrecordid>eNp9kU1vGyEQhlHVSHXT_oGekHLKYdMBvOySm5WPNpKlVPk4I8wODtYGHMBRe8h_D85WqnKpOIBmnvedQS8h3xicMIDue2aMS9EAZw2wthON_EBmbw82h_YjmYHi0LRSsU_kc84bAOA9FzPyco7jsw9r6kOJtDwgvXAObfHPGDBnGh29QYvbfYFeehyHfEqXaFLYi26tGbG5SyZkhymZ1Yh0keyDL9VilzBTFxP9lUw1rCi9Xm1qg57jvu9j-EIOnBkzfv17H5L7y4u7s5_N8vrH1dli2VgxF6VxdgApwHaWuYHhig2Sg2PKzoXgA_YrBX1nHJOiA8Vk2xqr-g561ys2tEMvDsnR5LtN8WmHuehN3KVQR2ou51yqXnSqUicTta6_0j64WOrm9Qz46G0M6HytLyrN61wlquD4naAyBX-XtdnlrK9ub96zfGJtijkndHqb_KNJfzQDvc9QTxnqmqF-y1DLKhKTKFc4rDH92_s_qldKFZ67</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2642698379</pqid></control><display><type>article</type><title>Delving into the Effectiveness of Receptive Fields: Learning Scale-Transferrable Architectures for Practical Object Detection</title><source>SpringerLink Journals - AutoHoldings</source><creator>Zhang, Zhaoxiang ; Pan, Cong ; Peng, Junran</creator><creatorcontrib>Zhang, Zhaoxiang ; Pan, Cong ; Peng, Junran</creatorcontrib><description>Scale-sensitive object detection remains a challenging task, where most of the existing methods could not learn it explicitly and are not robust. Besides, they are less efficient during training or slow during inference, which is not friendly to real-time applications. In this paper, we propose a scale-transferrable architecture for practical object detection based on the analysis of the connection between dilation rate and effective receptive field. Our method firstly predicts a global continuous scale, which is shared by all positions, for each convolution filter of each network stage. Secondly, we average the spatial features and distill the scale from channels to effectively learn the scale. Thirdly, for fast-deployment, we propose a scale decomposition method that transfers the robust fractional scale into the combination of fixed integral scales for each convolution filter, which exploits the dilated convolution. Moreover, to overcome the shortcomings of our method for large-scale object detection, we modify the Feature Pyramid Network structure. Finally, we illustrate the orthogonality role of our method for sampling strategy. We demonstrate the effectiveness of our method on one-stage and two-stage algorithms under different configurations and compare them with different dilated convolution blocks. For practical applications, the training strategy of our method is simple and efficient, avoiding complex data sampling or optimization strategy. During inference, we reduce the latency of the proposed method by using the hardware accelerator TensorRT without extra operation. On the COCO
test-dev
, our model achieves 41.7% mAP on one-stage detector and 42.5% mAP on two-stage detector based on ResNet-101, and outperforms baselines by 3.2% and 3.1% mAP, respectively.</description><identifier>ISSN: 0920-5691</identifier><identifier>EISSN: 1573-1405</identifier><identifier>DOI: 10.1007/s11263-021-01573-6</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Accuracy ; Algorithms ; Analysis ; Artificial Intelligence ; Computer Imaging ; Computer Science ; Convolution ; Data sampling ; Decomposition ; Detectors ; Image Processing and Computer Vision ; Inference ; Learning ; Methods ; Network latency ; Neural networks ; Object recognition ; Optimization ; Orthogonality ; Pattern Recognition ; Pattern Recognition and Graphics ; Science ; Sensors ; Training ; Vision</subject><ispartof>International journal of computer vision, 2022-04, Vol.130 (4), p.970-989</ispartof><rights>The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2022</rights><rights>COPYRIGHT 2022 Springer</rights><rights>The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2022.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c343t-fcd0630c7c1fd1eb1d620f19c4332de8b9087af1637091655ac98708f891d5d83</cites><orcidid>0000-0001-5959-4294</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s11263-021-01573-6$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s11263-021-01573-6$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,776,780,27901,27902,41464,42533,51294</link.rule.ids></links><search><creatorcontrib>Zhang, Zhaoxiang</creatorcontrib><creatorcontrib>Pan, Cong</creatorcontrib><creatorcontrib>Peng, Junran</creatorcontrib><title>Delving into the Effectiveness of Receptive Fields: Learning Scale-Transferrable Architectures for Practical Object Detection</title><title>International journal of computer vision</title><addtitle>Int J Comput Vis</addtitle><description>Scale-sensitive object detection remains a challenging task, where most of the existing methods could not learn it explicitly and are not robust. Besides, they are less efficient during training or slow during inference, which is not friendly to real-time applications. In this paper, we propose a scale-transferrable architecture for practical object detection based on the analysis of the connection between dilation rate and effective receptive field. Our method firstly predicts a global continuous scale, which is shared by all positions, for each convolution filter of each network stage. Secondly, we average the spatial features and distill the scale from channels to effectively learn the scale. Thirdly, for fast-deployment, we propose a scale decomposition method that transfers the robust fractional scale into the combination of fixed integral scales for each convolution filter, which exploits the dilated convolution. Moreover, to overcome the shortcomings of our method for large-scale object detection, we modify the Feature Pyramid Network structure. Finally, we illustrate the orthogonality role of our method for sampling strategy. We demonstrate the effectiveness of our method on one-stage and two-stage algorithms under different configurations and compare them with different dilated convolution blocks. For practical applications, the training strategy of our method is simple and efficient, avoiding complex data sampling or optimization strategy. During inference, we reduce the latency of the proposed method by using the hardware accelerator TensorRT without extra operation. On the COCO
test-dev
, our model achieves 41.7% mAP on one-stage detector and 42.5% mAP on two-stage detector based on ResNet-101, and outperforms baselines by 3.2% and 3.1% mAP, respectively.</description><subject>Accuracy</subject><subject>Algorithms</subject><subject>Analysis</subject><subject>Artificial Intelligence</subject><subject>Computer Imaging</subject><subject>Computer Science</subject><subject>Convolution</subject><subject>Data sampling</subject><subject>Decomposition</subject><subject>Detectors</subject><subject>Image Processing and Computer Vision</subject><subject>Inference</subject><subject>Learning</subject><subject>Methods</subject><subject>Network latency</subject><subject>Neural networks</subject><subject>Object recognition</subject><subject>Optimization</subject><subject>Orthogonality</subject><subject>Pattern Recognition</subject><subject>Pattern Recognition and Graphics</subject><subject>Science</subject><subject>Sensors</subject><subject>Training</subject><subject>Vision</subject><issn>0920-5691</issn><issn>1573-1405</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNp9kU1vGyEQhlHVSHXT_oGekHLKYdMBvOySm5WPNpKlVPk4I8wODtYGHMBRe8h_D85WqnKpOIBmnvedQS8h3xicMIDue2aMS9EAZw2wthON_EBmbw82h_YjmYHi0LRSsU_kc84bAOA9FzPyco7jsw9r6kOJtDwgvXAObfHPGDBnGh29QYvbfYFeehyHfEqXaFLYi26tGbG5SyZkhymZ1Yh0keyDL9VilzBTFxP9lUw1rCi9Xm1qg57jvu9j-EIOnBkzfv17H5L7y4u7s5_N8vrH1dli2VgxF6VxdgApwHaWuYHhig2Sg2PKzoXgA_YrBX1nHJOiA8Vk2xqr-g561ys2tEMvDsnR5LtN8WmHuehN3KVQR2ou51yqXnSqUicTta6_0j64WOrm9Qz46G0M6HytLyrN61wlquD4naAyBX-XtdnlrK9ub96zfGJtijkndHqb_KNJfzQDvc9QTxnqmqF-y1DLKhKTKFc4rDH92_s_qldKFZ67</recordid><startdate>20220401</startdate><enddate>20220401</enddate><creator>Zhang, Zhaoxiang</creator><creator>Pan, Cong</creator><creator>Peng, Junran</creator><general>Springer US</general><general>Springer</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>ISR</scope><scope>3V.</scope><scope>7SC</scope><scope>7WY</scope><scope>7WZ</scope><scope>7XB</scope><scope>87Z</scope><scope>8AL</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>8FL</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BEZIV</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FRNLG</scope><scope>F~G</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K60</scope><scope>K6~</scope><scope>K7-</scope><scope>L.-</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>M0C</scope><scope>M0N</scope><scope>P5Z</scope><scope>P62</scope><scope>PQBIZ</scope><scope>PQBZA</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PYYUZ</scope><scope>Q9U</scope><orcidid>https://orcid.org/0000-0001-5959-4294</orcidid></search><sort><creationdate>20220401</creationdate><title>Delving into the Effectiveness of Receptive Fields: Learning Scale-Transferrable Architectures for Practical Object Detection</title><author>Zhang, Zhaoxiang ; Pan, Cong ; Peng, Junran</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c343t-fcd0630c7c1fd1eb1d620f19c4332de8b9087af1637091655ac98708f891d5d83</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Accuracy</topic><topic>Algorithms</topic><topic>Analysis</topic><topic>Artificial Intelligence</topic><topic>Computer Imaging</topic><topic>Computer Science</topic><topic>Convolution</topic><topic>Data sampling</topic><topic>Decomposition</topic><topic>Detectors</topic><topic>Image Processing and Computer Vision</topic><topic>Inference</topic><topic>Learning</topic><topic>Methods</topic><topic>Network latency</topic><topic>Neural networks</topic><topic>Object recognition</topic><topic>Optimization</topic><topic>Orthogonality</topic><topic>Pattern Recognition</topic><topic>Pattern Recognition and Graphics</topic><topic>Science</topic><topic>Sensors</topic><topic>Training</topic><topic>Vision</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zhang, Zhaoxiang</creatorcontrib><creatorcontrib>Pan, Cong</creatorcontrib><creatorcontrib>Peng, Junran</creatorcontrib><collection>CrossRef</collection><collection>Science In Context</collection><collection>ProQuest Central (Corporate)</collection><collection>Computer and Information Systems Abstracts</collection><collection>ABI/INFORM Collection</collection><collection>ABI/INFORM Global (PDF only)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>ABI/INFORM Collection</collection><collection>Computing Database (Alumni Edition)</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ABI/INFORM Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Database (1962 - current)</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>ProQuest Business Premium Collection</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>Business Premium Collection (Alumni)</collection><collection>ABI/INFORM Global (Corporate)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Business Collection (Alumni Edition)</collection><collection>ProQuest Business Collection</collection><collection>Computer Science Database</collection><collection>ABI/INFORM Professional Advanced</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>ABI/INFORM Global</collection><collection>Computing Database</collection><collection>ProQuest advanced technologies & aerospace journals</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>One Business</collection><collection>ProQuest One Business (Alumni)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ABI/INFORM Collection China</collection><collection>ProQuest Central Basic</collection><jtitle>International journal of computer vision</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Zhang, Zhaoxiang</au><au>Pan, Cong</au><au>Peng, Junran</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Delving into the Effectiveness of Receptive Fields: Learning Scale-Transferrable Architectures for Practical Object Detection</atitle><jtitle>International journal of computer vision</jtitle><stitle>Int J Comput Vis</stitle><date>2022-04-01</date><risdate>2022</risdate><volume>130</volume><issue>4</issue><spage>970</spage><epage>989</epage><pages>970-989</pages><issn>0920-5691</issn><eissn>1573-1405</eissn><abstract>Scale-sensitive object detection remains a challenging task, where most of the existing methods could not learn it explicitly and are not robust. Besides, they are less efficient during training or slow during inference, which is not friendly to real-time applications. In this paper, we propose a scale-transferrable architecture for practical object detection based on the analysis of the connection between dilation rate and effective receptive field. Our method firstly predicts a global continuous scale, which is shared by all positions, for each convolution filter of each network stage. Secondly, we average the spatial features and distill the scale from channels to effectively learn the scale. Thirdly, for fast-deployment, we propose a scale decomposition method that transfers the robust fractional scale into the combination of fixed integral scales for each convolution filter, which exploits the dilated convolution. Moreover, to overcome the shortcomings of our method for large-scale object detection, we modify the Feature Pyramid Network structure. Finally, we illustrate the orthogonality role of our method for sampling strategy. We demonstrate the effectiveness of our method on one-stage and two-stage algorithms under different configurations and compare them with different dilated convolution blocks. For practical applications, the training strategy of our method is simple and efficient, avoiding complex data sampling or optimization strategy. During inference, we reduce the latency of the proposed method by using the hardware accelerator TensorRT without extra operation. On the COCO
test-dev
, our model achieves 41.7% mAP on one-stage detector and 42.5% mAP on two-stage detector based on ResNet-101, and outperforms baselines by 3.2% and 3.1% mAP, respectively.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s11263-021-01573-6</doi><tpages>20</tpages><orcidid>https://orcid.org/0000-0001-5959-4294</orcidid></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0920-5691 |
ispartof | International journal of computer vision, 2022-04, Vol.130 (4), p.970-989 |
issn | 0920-5691 1573-1405 |
language | eng |
recordid | cdi_proquest_journals_2642698379 |
source | SpringerLink Journals - AutoHoldings |
subjects | Accuracy Algorithms Analysis Artificial Intelligence Computer Imaging Computer Science Convolution Data sampling Decomposition Detectors Image Processing and Computer Vision Inference Learning Methods Network latency Neural networks Object recognition Optimization Orthogonality Pattern Recognition Pattern Recognition and Graphics Science Sensors Training Vision |
title | Delving into the Effectiveness of Receptive Fields: Learning Scale-Transferrable Architectures for Practical Object Detection |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-07T16%3A28%3A43IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_proqu&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Delving%20into%20the%20Effectiveness%20of%20Receptive%20Fields:%20Learning%20Scale-Transferrable%20Architectures%20for%20Practical%20Object%20Detection&rft.jtitle=International%20journal%20of%20computer%20vision&rft.au=Zhang,%20Zhaoxiang&rft.date=2022-04-01&rft.volume=130&rft.issue=4&rft.spage=970&rft.epage=989&rft.pages=970-989&rft.issn=0920-5691&rft.eissn=1573-1405&rft_id=info:doi/10.1007/s11263-021-01573-6&rft_dat=%3Cgale_proqu%3EA698233293%3C/gale_proqu%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2642698379&rft_id=info:pmid/&rft_galeid=A698233293&rfr_iscdi=true |