Hardware Acceleration and Implementation of YOLOX-s for On-Orbit FPGA

The rapid development of remote sensing technology has brought about a sharp increase in the amount of remote sensing image data. However, due to the satellite’s limited hardware resources, space, and power consumption constraints, it is difficult to process massive remote sensing images efficiently...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Electronics (Basel) 2022-11, Vol.11 (21), p.3473
Hauptverfasser: Wang, Ling, Zhou, Hai, Bian, Chunjiang, Jiang, Kangning, Cheng, Xiaolei
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue 21
container_start_page 3473
container_title Electronics (Basel)
container_volume 11
creator Wang, Ling
Zhou, Hai
Bian, Chunjiang
Jiang, Kangning
Cheng, Xiaolei
description The rapid development of remote sensing technology has brought about a sharp increase in the amount of remote sensing image data. However, due to the satellite’s limited hardware resources, space, and power consumption constraints, it is difficult to process massive remote sensing images efficiently and robustly using the traditional remote sensing image processing methods. Additionally, the task of satellite-to-ground target detection has higher requirements for speed and accuracy under the conditions of more and more remote sensing data. To solve these problems, this paper proposes an extremely efficient and reliable acceleration architecture for forward inference of the YOLOX-s detection network an on-orbit FPGA. Considering the limited onboard resources, the design strategy of the parallel loop unrolling of the input channels and output channels is adopted to build the largest DSP computing array to ensure a reliable and full utilization of the limited computing resources, thus reducing the inference delay of the entire network. Meanwhile, a three-path cache queue and a small-scale cascaded pooling array are designed, which maximize the reuse of on-chip cache data, effectively reduce the bandwidth bottleneck of the external memory, and ensure an efficient computing of the entire computing array. The experimental results show that at the 200 MHz operating frequency of the VC709, the overall inference performance of the FPGA acceleration can reach 399.62 GOPS, the peak performance can reach 408.4 GOPS, and the overall computing efficiency of the DSP array can reach 97.56%. Compared with the previous work, our architecture design further improves the computing efficiency under limited hardware resources.
doi_str_mv 10.3390/electronics11213473
format Article
fullrecord <record><control><sourceid>gale_proqu</sourceid><recordid>TN_cdi_proquest_journals_2734621839</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A745598007</galeid><sourcerecordid>A745598007</sourcerecordid><originalsourceid>FETCH-LOGICAL-c361t-4205989aa4a8927145c8a119652d9bf0f49a275fb620e36e6172e62a70d0e50a3</originalsourceid><addsrcrecordid>eNptUE1LxDAQDaLgsu4v8FLw3DUfbdMcy7JfsFAPCnoqs-lEurTJmnQR_72RevDgzGGGx3tvhkfIPaNLIRR9xB716J3tdGCMM5FJcUVmnEqVKq749Z_9lixCONFYiolS0BlZ78C3n-AxqbSOTh7GztkEbJvsh3OPA9pxgpxJ3upD_ZqGxDif1Dat_bEbk83TtrojNwb6gIvfOScvm_Xzapce6u1-VR1SLQo2phmnuSoVQAal4pJluS6BMVXkvFVHQ02mgMvcHAtOURRYMMmx4CBpSzGnIObkYfI9e_dxwTA2J3fxNp5suBRZwVkpVGQtJ9Y79Nh01rjRg47d4tBpZ9F0Ea9klsdvKJVRICaB9i4Ej6Y5-24A_9Uw2vxk3PyTsfgGVGJvLQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2734621839</pqid></control><display><type>article</type><title>Hardware Acceleration and Implementation of YOLOX-s for On-Orbit FPGA</title><source>MDPI - Multidisciplinary Digital Publishing Institute</source><source>EZB-FREE-00999 freely available EZB journals</source><creator>Wang, Ling ; Zhou, Hai ; Bian, Chunjiang ; Jiang, Kangning ; Cheng, Xiaolei</creator><creatorcontrib>Wang, Ling ; Zhou, Hai ; Bian, Chunjiang ; Jiang, Kangning ; Cheng, Xiaolei</creatorcontrib><description>The rapid development of remote sensing technology has brought about a sharp increase in the amount of remote sensing image data. However, due to the satellite’s limited hardware resources, space, and power consumption constraints, it is difficult to process massive remote sensing images efficiently and robustly using the traditional remote sensing image processing methods. Additionally, the task of satellite-to-ground target detection has higher requirements for speed and accuracy under the conditions of more and more remote sensing data. To solve these problems, this paper proposes an extremely efficient and reliable acceleration architecture for forward inference of the YOLOX-s detection network an on-orbit FPGA. Considering the limited onboard resources, the design strategy of the parallel loop unrolling of the input channels and output channels is adopted to build the largest DSP computing array to ensure a reliable and full utilization of the limited computing resources, thus reducing the inference delay of the entire network. Meanwhile, a three-path cache queue and a small-scale cascaded pooling array are designed, which maximize the reuse of on-chip cache data, effectively reduce the bandwidth bottleneck of the external memory, and ensure an efficient computing of the entire computing array. The experimental results show that at the 200 MHz operating frequency of the VC709, the overall inference performance of the FPGA acceleration can reach 399.62 GOPS, the peak performance can reach 408.4 GOPS, and the overall computing efficiency of the DSP array can reach 97.56%. Compared with the previous work, our architecture design further improves the computing efficiency under limited hardware resources.</description><identifier>ISSN: 2079-9292</identifier><identifier>EISSN: 2079-9292</identifier><identifier>DOI: 10.3390/electronics11213473</identifier><language>eng</language><publisher>Basel: MDPI AG</publisher><subject>Acceleration ; Accuracy ; Aerospace environments ; Algorithms ; Bandwidths ; Channels ; Computer architecture ; Computing time ; Design and construction ; Design optimization ; Digital integrated circuits ; Efficiency ; Field programmable gate arrays ; Hardware ; Image processing ; Inference ; Neural networks ; Object recognition (Computers) ; Pattern recognition ; Power consumption ; Remote sensing ; Satellite imagery ; Satellites ; Target detection</subject><ispartof>Electronics (Basel), 2022-11, Vol.11 (21), p.3473</ispartof><rights>COPYRIGHT 2022 MDPI AG</rights><rights>2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c361t-4205989aa4a8927145c8a119652d9bf0f49a275fb620e36e6172e62a70d0e50a3</citedby><cites>FETCH-LOGICAL-c361t-4205989aa4a8927145c8a119652d9bf0f49a275fb620e36e6172e62a70d0e50a3</cites><orcidid>0000-0003-4925-5971</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27922,27923</link.rule.ids></links><search><creatorcontrib>Wang, Ling</creatorcontrib><creatorcontrib>Zhou, Hai</creatorcontrib><creatorcontrib>Bian, Chunjiang</creatorcontrib><creatorcontrib>Jiang, Kangning</creatorcontrib><creatorcontrib>Cheng, Xiaolei</creatorcontrib><title>Hardware Acceleration and Implementation of YOLOX-s for On-Orbit FPGA</title><title>Electronics (Basel)</title><description>The rapid development of remote sensing technology has brought about a sharp increase in the amount of remote sensing image data. However, due to the satellite’s limited hardware resources, space, and power consumption constraints, it is difficult to process massive remote sensing images efficiently and robustly using the traditional remote sensing image processing methods. Additionally, the task of satellite-to-ground target detection has higher requirements for speed and accuracy under the conditions of more and more remote sensing data. To solve these problems, this paper proposes an extremely efficient and reliable acceleration architecture for forward inference of the YOLOX-s detection network an on-orbit FPGA. Considering the limited onboard resources, the design strategy of the parallel loop unrolling of the input channels and output channels is adopted to build the largest DSP computing array to ensure a reliable and full utilization of the limited computing resources, thus reducing the inference delay of the entire network. Meanwhile, a three-path cache queue and a small-scale cascaded pooling array are designed, which maximize the reuse of on-chip cache data, effectively reduce the bandwidth bottleneck of the external memory, and ensure an efficient computing of the entire computing array. The experimental results show that at the 200 MHz operating frequency of the VC709, the overall inference performance of the FPGA acceleration can reach 399.62 GOPS, the peak performance can reach 408.4 GOPS, and the overall computing efficiency of the DSP array can reach 97.56%. Compared with the previous work, our architecture design further improves the computing efficiency under limited hardware resources.</description><subject>Acceleration</subject><subject>Accuracy</subject><subject>Aerospace environments</subject><subject>Algorithms</subject><subject>Bandwidths</subject><subject>Channels</subject><subject>Computer architecture</subject><subject>Computing time</subject><subject>Design and construction</subject><subject>Design optimization</subject><subject>Digital integrated circuits</subject><subject>Efficiency</subject><subject>Field programmable gate arrays</subject><subject>Hardware</subject><subject>Image processing</subject><subject>Inference</subject><subject>Neural networks</subject><subject>Object recognition (Computers)</subject><subject>Pattern recognition</subject><subject>Power consumption</subject><subject>Remote sensing</subject><subject>Satellite imagery</subject><subject>Satellites</subject><subject>Target detection</subject><issn>2079-9292</issn><issn>2079-9292</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNptUE1LxDAQDaLgsu4v8FLw3DUfbdMcy7JfsFAPCnoqs-lEurTJmnQR_72RevDgzGGGx3tvhkfIPaNLIRR9xB716J3tdGCMM5FJcUVmnEqVKq749Z_9lixCONFYiolS0BlZ78C3n-AxqbSOTh7GztkEbJvsh3OPA9pxgpxJ3upD_ZqGxDif1Dat_bEbk83TtrojNwb6gIvfOScvm_Xzapce6u1-VR1SLQo2phmnuSoVQAal4pJluS6BMVXkvFVHQ02mgMvcHAtOURRYMMmx4CBpSzGnIObkYfI9e_dxwTA2J3fxNp5suBRZwVkpVGQtJ9Y79Nh01rjRg47d4tBpZ9F0Ea9klsdvKJVRICaB9i4Ej6Y5-24A_9Uw2vxk3PyTsfgGVGJvLQ</recordid><startdate>20221101</startdate><enddate>20221101</enddate><creator>Wang, Ling</creator><creator>Zhou, Hai</creator><creator>Bian, Chunjiang</creator><creator>Jiang, Kangning</creator><creator>Cheng, Xiaolei</creator><general>MDPI AG</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SP</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L7M</scope><scope>P5Z</scope><scope>P62</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><orcidid>https://orcid.org/0000-0003-4925-5971</orcidid></search><sort><creationdate>20221101</creationdate><title>Hardware Acceleration and Implementation of YOLOX-s for On-Orbit FPGA</title><author>Wang, Ling ; Zhou, Hai ; Bian, Chunjiang ; Jiang, Kangning ; Cheng, Xiaolei</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c361t-4205989aa4a8927145c8a119652d9bf0f49a275fb620e36e6172e62a70d0e50a3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Acceleration</topic><topic>Accuracy</topic><topic>Aerospace environments</topic><topic>Algorithms</topic><topic>Bandwidths</topic><topic>Channels</topic><topic>Computer architecture</topic><topic>Computing time</topic><topic>Design and construction</topic><topic>Design optimization</topic><topic>Digital integrated circuits</topic><topic>Efficiency</topic><topic>Field programmable gate arrays</topic><topic>Hardware</topic><topic>Image processing</topic><topic>Inference</topic><topic>Neural networks</topic><topic>Object recognition (Computers)</topic><topic>Pattern recognition</topic><topic>Power consumption</topic><topic>Remote sensing</topic><topic>Satellite imagery</topic><topic>Satellites</topic><topic>Target detection</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Wang, Ling</creatorcontrib><creatorcontrib>Zhou, Hai</creatorcontrib><creatorcontrib>Bian, Chunjiang</creatorcontrib><creatorcontrib>Jiang, Kangning</creatorcontrib><creatorcontrib>Cheng, Xiaolei</creatorcontrib><collection>CrossRef</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Advanced Technologies &amp; Aerospace Database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><jtitle>Electronics (Basel)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Wang, Ling</au><au>Zhou, Hai</au><au>Bian, Chunjiang</au><au>Jiang, Kangning</au><au>Cheng, Xiaolei</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Hardware Acceleration and Implementation of YOLOX-s for On-Orbit FPGA</atitle><jtitle>Electronics (Basel)</jtitle><date>2022-11-01</date><risdate>2022</risdate><volume>11</volume><issue>21</issue><spage>3473</spage><pages>3473-</pages><issn>2079-9292</issn><eissn>2079-9292</eissn><abstract>The rapid development of remote sensing technology has brought about a sharp increase in the amount of remote sensing image data. However, due to the satellite’s limited hardware resources, space, and power consumption constraints, it is difficult to process massive remote sensing images efficiently and robustly using the traditional remote sensing image processing methods. Additionally, the task of satellite-to-ground target detection has higher requirements for speed and accuracy under the conditions of more and more remote sensing data. To solve these problems, this paper proposes an extremely efficient and reliable acceleration architecture for forward inference of the YOLOX-s detection network an on-orbit FPGA. Considering the limited onboard resources, the design strategy of the parallel loop unrolling of the input channels and output channels is adopted to build the largest DSP computing array to ensure a reliable and full utilization of the limited computing resources, thus reducing the inference delay of the entire network. Meanwhile, a three-path cache queue and a small-scale cascaded pooling array are designed, which maximize the reuse of on-chip cache data, effectively reduce the bandwidth bottleneck of the external memory, and ensure an efficient computing of the entire computing array. The experimental results show that at the 200 MHz operating frequency of the VC709, the overall inference performance of the FPGA acceleration can reach 399.62 GOPS, the peak performance can reach 408.4 GOPS, and the overall computing efficiency of the DSP array can reach 97.56%. Compared with the previous work, our architecture design further improves the computing efficiency under limited hardware resources.</abstract><cop>Basel</cop><pub>MDPI AG</pub><doi>10.3390/electronics11213473</doi><orcidid>https://orcid.org/0000-0003-4925-5971</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2079-9292
ispartof Electronics (Basel), 2022-11, Vol.11 (21), p.3473
issn 2079-9292
2079-9292
language eng
recordid cdi_proquest_journals_2734621839
source MDPI - Multidisciplinary Digital Publishing Institute; EZB-FREE-00999 freely available EZB journals
subjects Acceleration
Accuracy
Aerospace environments
Algorithms
Bandwidths
Channels
Computer architecture
Computing time
Design and construction
Design optimization
Digital integrated circuits
Efficiency
Field programmable gate arrays
Hardware
Image processing
Inference
Neural networks
Object recognition (Computers)
Pattern recognition
Power consumption
Remote sensing
Satellite imagery
Satellites
Target detection
title Hardware Acceleration and Implementation of YOLOX-s for On-Orbit FPGA
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-14T00%3A11%3A10IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_proqu&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Hardware%20Acceleration%20and%20Implementation%20of%20YOLOX-s%20for%20On-Orbit%20FPGA&rft.jtitle=Electronics%20(Basel)&rft.au=Wang,%20Ling&rft.date=2022-11-01&rft.volume=11&rft.issue=21&rft.spage=3473&rft.pages=3473-&rft.issn=2079-9292&rft.eissn=2079-9292&rft_id=info:doi/10.3390/electronics11213473&rft_dat=%3Cgale_proqu%3EA745598007%3C/gale_proqu%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2734621839&rft_id=info:pmid/&rft_galeid=A745598007&rfr_iscdi=true