Pareto Refocusing for Drone-View Object Detection

Drone-view Object Detection (DOD) is a meaningful but challenging task. It hits a bottleneck due to two main reasons: (1) The high proportion of difficult objects (e.g., small objects, occluded objects, etc.) makes the detection performance unsatisfactory. (2) The unevenly distributed objects make d...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on circuits and systems for video technology 2023-03, Vol.33 (3), p.1320-1334
Hauptverfasser:	Leng, Jiaxu, Mo, Mengjingcheng, Zhou, Yinghua, Gao, Chenqiang, Li, Weisheng, Gao, Xinbo
Format:	Artikel
Sprache:	eng
Schlagworte:	challenging region prediction Context context learning Datasets Detectors Drone-view object detection Drones Feature extraction Image recognition Inspection Modules Object detection Object recognition pareto refocusing Task analysis Visualization
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	1334
container_issue	3
container_start_page	1320
container_title	IEEE transactions on circuits and systems for video technology
container_volume	33
creator	Leng, Jiaxu Mo, Mengjingcheng Zhou, Yinghua Gao, Chenqiang Li, Weisheng Gao, Xinbo
description	Drone-view Object Detection (DOD) is a meaningful but challenging task. It hits a bottleneck due to two main reasons: (1) The high proportion of difficult objects (e.g., small objects, occluded objects, etc.) makes the detection performance unsatisfactory. (2) The unevenly distributed objects make detection inefficient. These two factors also lead to a phenomenon, obeying the Pareto principle, that some challenging regions occupying a low area proportion of the image have a significant impact on the final detection while the vanilla regions occupying the major area have a negligible impact due to the limited room for performance improvement. Motivated by the human visual system that naturally attempts to invest unequal energies in things of hierarchical difficulty for recognizing objects effectively, this paper presents a novel Pareto Refocusing Detection (PRDet) network that distinguishes the challenging regions from the vanilla regions under reverse-attention guidance and refocuses the challenging regions with the assistance of the region-specific context. Specifically, we first propose a Reverse-attention Exploration Module (REM) that excavates the potential position of difficult objects by suppressing the features which are salient to the commonly used detector. Then, we propose a Region-specific Context Learning Module (RCLM) that learns to generate specific contexts for strengthening the understanding of challenging regions. It is noteworthy that the specific context is not shared globally but unique for each challenging region with the exploration of spatial and appearance cues. Extensive experiments and comprehensive evaluations on the VisDrone2021-DET and UAVDT datasets demonstrate that the proposed PRDet can effectively improve the detection performance, especially for those difficult objects, outperforming state-of-the-art detectors. Furthermore, our method also achieves significant performance improvements on the DTU-Drone dataset for power inspection.
doi_str_mv	10.1109/TCSVT.2022.3210207
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TCSVT_2022_3210207</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9905640</ieee_id><sourcerecordid>2784636380</sourcerecordid><originalsourceid>FETCH-LOGICAL-c295t-ec5f023a98efa9e103257024cb48bcab7fd3c47df4003110ff9847b6f0403b9f3</originalsourceid><addsrcrecordid>eNo9kM1OwzAQhC0EEqXwAnCJxDllvbYT-4hS_qRKRVB6tRJ3jVJBXJxUiLfHpRWn2cPMzu7H2CWHCedgbhbV63IxQUCcCOSAUB6xEVdK54igjtMMiucauTplZ32_BuBSy3LE-HMdaQjZC_ngtn3bvWc-xGwaQ0f5sqXvbN6syQ3ZlIYkbejO2YmvP3q6OOiYvd3fLarHfDZ_eKpuZ7lDo4acnPKAojaafG2Ig0BVAkrXSN24uin9SjhZrrwEEOkH7006qCk8SBCN8WLMrvd7NzF8bakf7DpsY5cqLZZaFqIQGpIL9y4XQ99H8nYT2886_lgOdofG_qGxOzT2gCaFrvahloj-A8aAKlL5LzKnXlg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2784636380</pqid></control><display><type>article</type><title>Pareto Refocusing for Drone-View Object Detection</title><source>IEEE Electronic Library (IEL)</source><creator>Leng, Jiaxu ; Mo, Mengjingcheng ; Zhou, Yinghua ; Gao, Chenqiang ; Li, Weisheng ; Gao, Xinbo</creator><creatorcontrib>Leng, Jiaxu ; Mo, Mengjingcheng ; Zhou, Yinghua ; Gao, Chenqiang ; Li, Weisheng ; Gao, Xinbo</creatorcontrib><description>Drone-view Object Detection (DOD) is a meaningful but challenging task. It hits a bottleneck due to two main reasons: (1) The high proportion of difficult objects (e.g., small objects, occluded objects, etc.) makes the detection performance unsatisfactory. (2) The unevenly distributed objects make detection inefficient. These two factors also lead to a phenomenon, obeying the Pareto principle, that some challenging regions occupying a low area proportion of the image have a significant impact on the final detection while the vanilla regions occupying the major area have a negligible impact due to the limited room for performance improvement. Motivated by the human visual system that naturally attempts to invest unequal energies in things of hierarchical difficulty for recognizing objects effectively, this paper presents a novel Pareto Refocusing Detection (PRDet) network that distinguishes the challenging regions from the vanilla regions under reverse-attention guidance and refocuses the challenging regions with the assistance of the region-specific context. Specifically, we first propose a Reverse-attention Exploration Module (REM) that excavates the potential position of difficult objects by suppressing the features which are salient to the commonly used detector. Then, we propose a Region-specific Context Learning Module (RCLM) that learns to generate specific contexts for strengthening the understanding of challenging regions. It is noteworthy that the specific context is not shared globally but unique for each challenging region with the exploration of spatial and appearance cues. Extensive experiments and comprehensive evaluations on the VisDrone2021-DET and UAVDT datasets demonstrate that the proposed PRDet can effectively improve the detection performance, especially for those difficult objects, outperforming state-of-the-art detectors. Furthermore, our method also achieves significant performance improvements on the DTU-Drone dataset for power inspection.</description><identifier>ISSN: 1051-8215</identifier><identifier>EISSN: 1558-2205</identifier><identifier>DOI: 10.1109/TCSVT.2022.3210207</identifier><identifier>CODEN: ITCTEM</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>challenging region prediction ; Context ; context learning ; Datasets ; Detectors ; Drone-view object detection ; Drones ; Feature extraction ; Image recognition ; Inspection ; Modules ; Object detection ; Object recognition ; pareto refocusing ; Task analysis ; Visualization</subject><ispartof>IEEE transactions on circuits and systems for video technology, 2023-03, Vol.33 (3), p.1320-1334</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c295t-ec5f023a98efa9e103257024cb48bcab7fd3c47df4003110ff9847b6f0403b9f3</citedby><cites>FETCH-LOGICAL-c295t-ec5f023a98efa9e103257024cb48bcab7fd3c47df4003110ff9847b6f0403b9f3</cites><orcidid>0000-0003-4174-4148 ; 0000-0002-8199-2501 ; 0000-0002-9033-8245 ; 0000-0002-7985-0037 ; 0000-0001-8025-047X ; 0000-0003-2802-8139</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9905640$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9905640$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Leng, Jiaxu</creatorcontrib><creatorcontrib>Mo, Mengjingcheng</creatorcontrib><creatorcontrib>Zhou, Yinghua</creatorcontrib><creatorcontrib>Gao, Chenqiang</creatorcontrib><creatorcontrib>Li, Weisheng</creatorcontrib><creatorcontrib>Gao, Xinbo</creatorcontrib><title>Pareto Refocusing for Drone-View Object Detection</title><title>IEEE transactions on circuits and systems for video technology</title><addtitle>TCSVT</addtitle><description>Drone-view Object Detection (DOD) is a meaningful but challenging task. It hits a bottleneck due to two main reasons: (1) The high proportion of difficult objects (e.g., small objects, occluded objects, etc.) makes the detection performance unsatisfactory. (2) The unevenly distributed objects make detection inefficient. These two factors also lead to a phenomenon, obeying the Pareto principle, that some challenging regions occupying a low area proportion of the image have a significant impact on the final detection while the vanilla regions occupying the major area have a negligible impact due to the limited room for performance improvement. Motivated by the human visual system that naturally attempts to invest unequal energies in things of hierarchical difficulty for recognizing objects effectively, this paper presents a novel Pareto Refocusing Detection (PRDet) network that distinguishes the challenging regions from the vanilla regions under reverse-attention guidance and refocuses the challenging regions with the assistance of the region-specific context. Specifically, we first propose a Reverse-attention Exploration Module (REM) that excavates the potential position of difficult objects by suppressing the features which are salient to the commonly used detector. Then, we propose a Region-specific Context Learning Module (RCLM) that learns to generate specific contexts for strengthening the understanding of challenging regions. It is noteworthy that the specific context is not shared globally but unique for each challenging region with the exploration of spatial and appearance cues. Extensive experiments and comprehensive evaluations on the VisDrone2021-DET and UAVDT datasets demonstrate that the proposed PRDet can effectively improve the detection performance, especially for those difficult objects, outperforming state-of-the-art detectors. Furthermore, our method also achieves significant performance improvements on the DTU-Drone dataset for power inspection.</description><subject>challenging region prediction</subject><subject>Context</subject><subject>context learning</subject><subject>Datasets</subject><subject>Detectors</subject><subject>Drone-view object detection</subject><subject>Drones</subject><subject>Feature extraction</subject><subject>Image recognition</subject><subject>Inspection</subject><subject>Modules</subject><subject>Object detection</subject><subject>Object recognition</subject><subject>pareto refocusing</subject><subject>Task analysis</subject><subject>Visualization</subject><issn>1051-8215</issn><issn>1558-2205</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kM1OwzAQhC0EEqXwAnCJxDllvbYT-4hS_qRKRVB6tRJ3jVJBXJxUiLfHpRWn2cPMzu7H2CWHCedgbhbV63IxQUCcCOSAUB6xEVdK54igjtMMiucauTplZ32_BuBSy3LE-HMdaQjZC_ngtn3bvWc-xGwaQ0f5sqXvbN6syQ3ZlIYkbejO2YmvP3q6OOiYvd3fLarHfDZ_eKpuZ7lDo4acnPKAojaafG2Ig0BVAkrXSN24uin9SjhZrrwEEOkH7006qCk8SBCN8WLMrvd7NzF8bakf7DpsY5cqLZZaFqIQGpIL9y4XQ99H8nYT2886_lgOdofG_qGxOzT2gCaFrvahloj-A8aAKlL5LzKnXlg</recordid><startdate>20230301</startdate><enddate>20230301</enddate><creator>Leng, Jiaxu</creator><creator>Mo, Mengjingcheng</creator><creator>Zhou, Yinghua</creator><creator>Gao, Chenqiang</creator><creator>Li, Weisheng</creator><creator>Gao, Xinbo</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0003-4174-4148</orcidid><orcidid>https://orcid.org/0000-0002-8199-2501</orcidid><orcidid>https://orcid.org/0000-0002-9033-8245</orcidid><orcidid>https://orcid.org/0000-0002-7985-0037</orcidid><orcidid>https://orcid.org/0000-0001-8025-047X</orcidid><orcidid>https://orcid.org/0000-0003-2802-8139</orcidid></search><sort><creationdate>20230301</creationdate><title>Pareto Refocusing for Drone-View Object Detection</title><author>Leng, Jiaxu ; Mo, Mengjingcheng ; Zhou, Yinghua ; Gao, Chenqiang ; Li, Weisheng ; Gao, Xinbo</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c295t-ec5f023a98efa9e103257024cb48bcab7fd3c47df4003110ff9847b6f0403b9f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>challenging region prediction</topic><topic>Context</topic><topic>context learning</topic><topic>Datasets</topic><topic>Detectors</topic><topic>Drone-view object detection</topic><topic>Drones</topic><topic>Feature extraction</topic><topic>Image recognition</topic><topic>Inspection</topic><topic>Modules</topic><topic>Object detection</topic><topic>Object recognition</topic><topic>pareto refocusing</topic><topic>Task analysis</topic><topic>Visualization</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Leng, Jiaxu</creatorcontrib><creatorcontrib>Mo, Mengjingcheng</creatorcontrib><creatorcontrib>Zhou, Yinghua</creatorcontrib><creatorcontrib>Gao, Chenqiang</creatorcontrib><creatorcontrib>Li, Weisheng</creatorcontrib><creatorcontrib>Gao, Xinbo</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on circuits and systems for video technology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Leng, Jiaxu</au><au>Mo, Mengjingcheng</au><au>Zhou, Yinghua</au><au>Gao, Chenqiang</au><au>Li, Weisheng</au><au>Gao, Xinbo</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Pareto Refocusing for Drone-View Object Detection</atitle><jtitle>IEEE transactions on circuits and systems for video technology</jtitle><stitle>TCSVT</stitle><date>2023-03-01</date><risdate>2023</risdate><volume>33</volume><issue>3</issue><spage>1320</spage><epage>1334</epage><pages>1320-1334</pages><issn>1051-8215</issn><eissn>1558-2205</eissn><coden>ITCTEM</coden><abstract>Drone-view Object Detection (DOD) is a meaningful but challenging task. It hits a bottleneck due to two main reasons: (1) The high proportion of difficult objects (e.g., small objects, occluded objects, etc.) makes the detection performance unsatisfactory. (2) The unevenly distributed objects make detection inefficient. These two factors also lead to a phenomenon, obeying the Pareto principle, that some challenging regions occupying a low area proportion of the image have a significant impact on the final detection while the vanilla regions occupying the major area have a negligible impact due to the limited room for performance improvement. Motivated by the human visual system that naturally attempts to invest unequal energies in things of hierarchical difficulty for recognizing objects effectively, this paper presents a novel Pareto Refocusing Detection (PRDet) network that distinguishes the challenging regions from the vanilla regions under reverse-attention guidance and refocuses the challenging regions with the assistance of the region-specific context. Specifically, we first propose a Reverse-attention Exploration Module (REM) that excavates the potential position of difficult objects by suppressing the features which are salient to the commonly used detector. Then, we propose a Region-specific Context Learning Module (RCLM) that learns to generate specific contexts for strengthening the understanding of challenging regions. It is noteworthy that the specific context is not shared globally but unique for each challenging region with the exploration of spatial and appearance cues. Extensive experiments and comprehensive evaluations on the VisDrone2021-DET and UAVDT datasets demonstrate that the proposed PRDet can effectively improve the detection performance, especially for those difficult objects, outperforming state-of-the-art detectors. Furthermore, our method also achieves significant performance improvements on the DTU-Drone dataset for power inspection.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TCSVT.2022.3210207</doi><tpages>15</tpages><orcidid>https://orcid.org/0000-0003-4174-4148</orcidid><orcidid>https://orcid.org/0000-0002-8199-2501</orcidid><orcidid>https://orcid.org/0000-0002-9033-8245</orcidid><orcidid>https://orcid.org/0000-0002-7985-0037</orcidid><orcidid>https://orcid.org/0000-0001-8025-047X</orcidid><orcidid>https://orcid.org/0000-0003-2802-8139</orcidid></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 1051-8215
ispartof	IEEE transactions on circuits and systems for video technology, 2023-03, Vol.33 (3), p.1320-1334
issn	1051-8215 1558-2205
language	eng
recordid	cdi_crossref_primary_10_1109_TCSVT_2022_3210207
source	IEEE Electronic Library (IEL)
subjects	challenging region prediction Context context learning Datasets Detectors Drone-view object detection Drones Feature extraction Image recognition Inspection Modules Object detection Object recognition pareto refocusing Task analysis Visualization
title	Pareto Refocusing for Drone-View Object Detection
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-26T05%3A23%3A31IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Pareto%20Refocusing%20for%20Drone-View%20Object%20Detection&rft.jtitle=IEEE%20transactions%20on%20circuits%20and%20systems%20for%20video%20technology&rft.au=Leng,%20Jiaxu&rft.date=2023-03-01&rft.volume=33&rft.issue=3&rft.spage=1320&rft.epage=1334&rft.pages=1320-1334&rft.issn=1051-8215&rft.eissn=1558-2205&rft.coden=ITCTEM&rft_id=info:doi/10.1109/TCSVT.2022.3210207&rft_dat=%3Cproquest_RIE%3E2784636380%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2784636380&rft_id=info:pmid/&rft_ieee_id=9905640&rfr_iscdi=true