Improved YOLOv5 for aerial images based on attention mechanism

Object detection based on unmanned aerial vehicle(UAV) platforms is essential for both engineering and research. Complex scale problems in UAV application scenarios require strong regression localization capabilities from target detection algorithms. Nonetheless, due to the constraints of UAV platfo...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE access 2023-01, Vol.11, p.1-1
Hauptverfasser: Li, Zebin, Fan, Bangkui, Xu, Yulong, Sun, Renwu
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1
container_issue
container_start_page 1
container_title IEEE access
container_volume 11
creator Li, Zebin
Fan, Bangkui
Xu, Yulong
Sun, Renwu
description Object detection based on unmanned aerial vehicle(UAV) platforms is essential for both engineering and research. Complex scale problems in UAV application scenarios require strong regression localization capabilities from target detection algorithms. Nonetheless, due to the constraints of UAV platform, it is difficult to increase accuracy by deepening the network. Therefore, this paper presents an improved YOLOv5 with an attention mechanism, consisting a Convolution-Swin Transformer Block(CSTB) utilizing Swin Transformer as well as a Convolution-block Attention Module(CBAM) to improve network positioning accuracy. In addition, this paper incorporates Bidirectional Feature Pyramid Network(BiFPN) [1], Spatial Pyramid Pooling-Fast(SPPF) and some network components to increase the average precision while maintaining the limited size of the model. Experiments on Visdrone2019 dataset show that the proposed approach can raise the mean Average Precision(mAP) by 5.4% compared to YOLOv5, with only 18% increase in model size.
doi_str_mv 10.1109/ACCESS.2023.3277931
format Article
fullrecord <record><control><sourceid>proquest_ieee_</sourceid><recordid>TN_cdi_ieee_primary_10129865</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10129865</ieee_id><doaj_id>oai_doaj_org_article_faaeb1a4209349c893ad52b734f40abf</doaj_id><sourcerecordid>2864343795</sourcerecordid><originalsourceid>FETCH-LOGICAL-c409t-8b7c4d679b85308cc519c4c40d75ec7ab291ee17b368e5686595df5b9c75a6f53</originalsourceid><addsrcrecordid>eNpNUE1Lw0AQDaJgqf0Fegh4Tt3PbPYilFC1UOihevC0zG4mNaVt6m5a8N-7NUU6lxlm5r0385LknpIxpUQ_TcpyulyOGWF8zJlSmtOrZMBorjMueX59Ud8moxDWJEYRW1INkufZdu_bI1bp52K-OMq0bn0K6BvYpM0WVhhSCyGO210KXYe7ronVFt0X7JqwvUtuatgEHJ3zMPl4mb6Xb9l88TorJ_PMCaK7rLDKiSpX2haSk8I5SbUTcVYpiU6BZZoiUmV5XqDMi1xqWdXSaqck5LXkw2TW81YtrM3ex9P8j2mhMX-N1q8M-K5xGzQ1AFoKghHNhXaF5lBJZhUXtSBg68j12HPFx78PGDqzbg9-F883rMgFF1zpkyLvt5xvQ_BY_6tSYk6-m953c_LdnH2PqIce1SDiBYIyHZ_iv6o1fTw</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2864343795</pqid></control><display><type>article</type><title>Improved YOLOv5 for aerial images based on attention mechanism</title><source>IEEE Open Access Journals</source><source>DOAJ Directory of Open Access Journals</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><creator>Li, Zebin ; Fan, Bangkui ; Xu, Yulong ; Sun, Renwu</creator><creatorcontrib>Li, Zebin ; Fan, Bangkui ; Xu, Yulong ; Sun, Renwu</creatorcontrib><description>Object detection based on unmanned aerial vehicle(UAV) platforms is essential for both engineering and research. Complex scale problems in UAV application scenarios require strong regression localization capabilities from target detection algorithms. Nonetheless, due to the constraints of UAV platform, it is difficult to increase accuracy by deepening the network. Therefore, this paper presents an improved YOLOv5 with an attention mechanism, consisting a Convolution-Swin Transformer Block(CSTB) utilizing Swin Transformer as well as a Convolution-block Attention Module(CBAM) to improve network positioning accuracy. In addition, this paper incorporates Bidirectional Feature Pyramid Network(BiFPN) [1], Spatial Pyramid Pooling-Fast(SPPF) and some network components to increase the average precision while maintaining the limited size of the model. Experiments on Visdrone2019 dataset show that the proposed approach can raise the mean Average Precision(mAP) by 5.4% compared to YOLOv5, with only 18% increase in model size.</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2023.3277931</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Algorithms ; Attention ; Autonomous aerial vehicles ; Convolution ; Feature extraction ; Object detection ; Object recognition ; Target detection ; Training ; Transformers ; UAV ; Unmanned aerial vehicles ; YOLO</subject><ispartof>IEEE access, 2023-01, Vol.11, p.1-1</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c409t-8b7c4d679b85308cc519c4c40d75ec7ab291ee17b368e5686595df5b9c75a6f53</citedby><cites>FETCH-LOGICAL-c409t-8b7c4d679b85308cc519c4c40d75ec7ab291ee17b368e5686595df5b9c75a6f53</cites><orcidid>0000-0002-6552-2665 ; 0009-0001-6174-4762</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10129865$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,776,780,860,2096,27610,27901,27902,54908</link.rule.ids></links><search><creatorcontrib>Li, Zebin</creatorcontrib><creatorcontrib>Fan, Bangkui</creatorcontrib><creatorcontrib>Xu, Yulong</creatorcontrib><creatorcontrib>Sun, Renwu</creatorcontrib><title>Improved YOLOv5 for aerial images based on attention mechanism</title><title>IEEE access</title><addtitle>Access</addtitle><description>Object detection based on unmanned aerial vehicle(UAV) platforms is essential for both engineering and research. Complex scale problems in UAV application scenarios require strong regression localization capabilities from target detection algorithms. Nonetheless, due to the constraints of UAV platform, it is difficult to increase accuracy by deepening the network. Therefore, this paper presents an improved YOLOv5 with an attention mechanism, consisting a Convolution-Swin Transformer Block(CSTB) utilizing Swin Transformer as well as a Convolution-block Attention Module(CBAM) to improve network positioning accuracy. In addition, this paper incorporates Bidirectional Feature Pyramid Network(BiFPN) [1], Spatial Pyramid Pooling-Fast(SPPF) and some network components to increase the average precision while maintaining the limited size of the model. Experiments on Visdrone2019 dataset show that the proposed approach can raise the mean Average Precision(mAP) by 5.4% compared to YOLOv5, with only 18% increase in model size.</description><subject>Algorithms</subject><subject>Attention</subject><subject>Autonomous aerial vehicles</subject><subject>Convolution</subject><subject>Feature extraction</subject><subject>Object detection</subject><subject>Object recognition</subject><subject>Target detection</subject><subject>Training</subject><subject>Transformers</subject><subject>UAV</subject><subject>Unmanned aerial vehicles</subject><subject>YOLO</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><sourceid>DOA</sourceid><recordid>eNpNUE1Lw0AQDaJgqf0Fegh4Tt3PbPYilFC1UOihevC0zG4mNaVt6m5a8N-7NUU6lxlm5r0385LknpIxpUQ_TcpyulyOGWF8zJlSmtOrZMBorjMueX59Ud8moxDWJEYRW1INkufZdu_bI1bp52K-OMq0bn0K6BvYpM0WVhhSCyGO210KXYe7ronVFt0X7JqwvUtuatgEHJ3zMPl4mb6Xb9l88TorJ_PMCaK7rLDKiSpX2haSk8I5SbUTcVYpiU6BZZoiUmV5XqDMi1xqWdXSaqck5LXkw2TW81YtrM3ex9P8j2mhMX-N1q8M-K5xGzQ1AFoKghHNhXaF5lBJZhUXtSBg68j12HPFx78PGDqzbg9-F883rMgFF1zpkyLvt5xvQ_BY_6tSYk6-m953c_LdnH2PqIce1SDiBYIyHZ_iv6o1fTw</recordid><startdate>20230101</startdate><enddate>20230101</enddate><creator>Li, Zebin</creator><creator>Fan, Bangkui</creator><creator>Xu, Yulong</creator><creator>Sun, Renwu</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0002-6552-2665</orcidid><orcidid>https://orcid.org/0009-0001-6174-4762</orcidid></search><sort><creationdate>20230101</creationdate><title>Improved YOLOv5 for aerial images based on attention mechanism</title><author>Li, Zebin ; Fan, Bangkui ; Xu, Yulong ; Sun, Renwu</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c409t-8b7c4d679b85308cc519c4c40d75ec7ab291ee17b368e5686595df5b9c75a6f53</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Algorithms</topic><topic>Attention</topic><topic>Autonomous aerial vehicles</topic><topic>Convolution</topic><topic>Feature extraction</topic><topic>Object detection</topic><topic>Object recognition</topic><topic>Target detection</topic><topic>Training</topic><topic>Transformers</topic><topic>UAV</topic><topic>Unmanned aerial vehicles</topic><topic>YOLO</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Li, Zebin</creatorcontrib><creatorcontrib>Fan, Bangkui</creatorcontrib><creatorcontrib>Xu, Yulong</creatorcontrib><creatorcontrib>Sun, Renwu</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Li, Zebin</au><au>Fan, Bangkui</au><au>Xu, Yulong</au><au>Sun, Renwu</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Improved YOLOv5 for aerial images based on attention mechanism</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2023-01-01</date><risdate>2023</risdate><volume>11</volume><spage>1</spage><epage>1</epage><pages>1-1</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>Object detection based on unmanned aerial vehicle(UAV) platforms is essential for both engineering and research. Complex scale problems in UAV application scenarios require strong regression localization capabilities from target detection algorithms. Nonetheless, due to the constraints of UAV platform, it is difficult to increase accuracy by deepening the network. Therefore, this paper presents an improved YOLOv5 with an attention mechanism, consisting a Convolution-Swin Transformer Block(CSTB) utilizing Swin Transformer as well as a Convolution-block Attention Module(CBAM) to improve network positioning accuracy. In addition, this paper incorporates Bidirectional Feature Pyramid Network(BiFPN) [1], Spatial Pyramid Pooling-Fast(SPPF) and some network components to increase the average precision while maintaining the limited size of the model. Experiments on Visdrone2019 dataset show that the proposed approach can raise the mean Average Precision(mAP) by 5.4% compared to YOLOv5, with only 18% increase in model size.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2023.3277931</doi><tpages>1</tpages><orcidid>https://orcid.org/0000-0002-6552-2665</orcidid><orcidid>https://orcid.org/0009-0001-6174-4762</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2169-3536
ispartof IEEE access, 2023-01, Vol.11, p.1-1
issn 2169-3536
2169-3536
language eng
recordid cdi_ieee_primary_10129865
source IEEE Open Access Journals; DOAJ Directory of Open Access Journals; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals
subjects Algorithms
Attention
Autonomous aerial vehicles
Convolution
Feature extraction
Object detection
Object recognition
Target detection
Training
Transformers
UAV
Unmanned aerial vehicles
YOLO
title Improved YOLOv5 for aerial images based on attention mechanism
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-09T01%3A52%3A38IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Improved%20YOLOv5%20for%20aerial%20images%20based%20on%20attention%20mechanism&rft.jtitle=IEEE%20access&rft.au=Li,%20Zebin&rft.date=2023-01-01&rft.volume=11&rft.spage=1&rft.epage=1&rft.pages=1-1&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2023.3277931&rft_dat=%3Cproquest_ieee_%3E2864343795%3C/proquest_ieee_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2864343795&rft_id=info:pmid/&rft_ieee_id=10129865&rft_doaj_id=oai_doaj_org_article_faaeb1a4209349c893ad52b734f40abf&rfr_iscdi=true