A dedicated hardware accelerator for real-time acceleration of YOLOv2

In recent years, dedicated hardware accelerators for the acceleration of the convolutional neural network (CNN) have been extensively studied. Although many studies have presented efficient designs on FPGAs for image classification neural network models such as AlexNet and VGG, there are still littl...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of real-time image processing 2021-06, Vol.18 (3), p.481-492
Hauptverfasser: Xu, Ke, Wang, Xiaoyun, Liu, Xinyang, Cao, Changfeng, Li, Huolin, Peng, Haiyong, Wang, Dong
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 492
container_issue 3
container_start_page 481
container_title Journal of real-time image processing
container_volume 18
creator Xu, Ke
Wang, Xiaoyun
Liu, Xinyang
Cao, Changfeng
Li, Huolin
Peng, Haiyong
Wang, Dong
description In recent years, dedicated hardware accelerators for the acceleration of the convolutional neural network (CNN) have been extensively studied. Although many studies have presented efficient designs on FPGAs for image classification neural network models such as AlexNet and VGG, there are still little implementations for CNN-based object detection applications. This paper presents an OpenCL-based high-throughput FPGA accelerator for the YOLOv2 object detection algorithm on Arria-10 GX1150 FPGA. The proposed hardware architecture adopts a scalable pipeline design to support multi-resolution input image and full 8-bit fixed-point datapath to improve hardware resource utilization. Layer fusion technology that merges the convolution, batch normalization and Leaky-ReLU is also developed to avoid transmission of intermediate data between FPGA and external memory. Experimental results show that the final design achieves a peak throughput of 566 GOP/s under the working frequency of 190 MHz. The accelerator can execute YOLOv2 inference computation ( 288 × 288 resolution) and tiny YOLOv2 ( 416 × 416 resolution) at the speed of 35 and 71 FPS, respectively.
doi_str_mv 10.1007/s11554-020-00977-w
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2918675736</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2918675736</sourcerecordid><originalsourceid>FETCH-LOGICAL-c385t-8126dc34ea818e9046a83c885122c47faf7f26aef41e81357ad0599d6b3b36fb3</originalsourceid><addsrcrecordid>eNp9kE1LAzEQhoMoWKt_wNOC5-gk2XzssRS_oNCLHjyFNJnolrZbk63Ff290RT15GGZg3vcd5iHknMElA9BXmTEpawocKECjNd0fkBEzilHDWXP4MwMck5OclwBKKyFH5HpSBQytdz2G6sWlsHcJK-c9rjC5vktVLJXQrWjfrv9s2m5TdbF6ms_mb_yUHEW3ynj23cfk8eb6YXpHZ_Pb--lkRr0wsqeGcRW8qNEZZrCBWjkjvDGSce5rHV3UkSuHsWZomJDaBZBNE9RCLISKCzEmF0PuNnWvO8y9XXa7tCknLW_Ki1pqoYqKDyqfupwTRrtN7dqld8vAfuKyAy5bcNkvXHZfTGIw5SLePGP6jf7H9QHQK2y9</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2918675736</pqid></control><display><type>article</type><title>A dedicated hardware accelerator for real-time acceleration of YOLOv2</title><source>SpringerLink Journals</source><source>ProQuest Central</source><creator>Xu, Ke ; Wang, Xiaoyun ; Liu, Xinyang ; Cao, Changfeng ; Li, Huolin ; Peng, Haiyong ; Wang, Dong</creator><creatorcontrib>Xu, Ke ; Wang, Xiaoyun ; Liu, Xinyang ; Cao, Changfeng ; Li, Huolin ; Peng, Haiyong ; Wang, Dong</creatorcontrib><description>In recent years, dedicated hardware accelerators for the acceleration of the convolutional neural network (CNN) have been extensively studied. Although many studies have presented efficient designs on FPGAs for image classification neural network models such as AlexNet and VGG, there are still little implementations for CNN-based object detection applications. This paper presents an OpenCL-based high-throughput FPGA accelerator for the YOLOv2 object detection algorithm on Arria-10 GX1150 FPGA. The proposed hardware architecture adopts a scalable pipeline design to support multi-resolution input image and full 8-bit fixed-point datapath to improve hardware resource utilization. Layer fusion technology that merges the convolution, batch normalization and Leaky-ReLU is also developed to avoid transmission of intermediate data between FPGA and external memory. Experimental results show that the final design achieves a peak throughput of 566 GOP/s under the working frequency of 190 MHz. The accelerator can execute YOLOv2 inference computation ( 288 × 288 resolution) and tiny YOLOv2 ( 416 × 416 resolution) at the speed of 35 and 71 FPS, respectively.</description><identifier>ISSN: 1861-8200</identifier><identifier>EISSN: 1861-8219</identifier><identifier>DOI: 10.1007/s11554-020-00977-w</identifier><language>eng</language><publisher>Berlin/Heidelberg: Springer Berlin Heidelberg</publisher><subject>Acceleration ; Accuracy ; Algorithms ; Artificial neural networks ; Bandwidths ; Circuits ; Computer Graphics ; Computer Science ; Design ; Field programmable gate arrays ; Hardware ; Image classification ; Image Processing and Computer Vision ; Multimedia Information Systems ; Neural networks ; Object recognition ; Original Research Paper ; Pattern Recognition ; Performance evaluation ; Pipeline design ; Power ; Resource utilization ; Signal,Image and Speech Processing ; Workloads</subject><ispartof>Journal of real-time image processing, 2021-06, Vol.18 (3), p.481-492</ispartof><rights>Springer-Verlag GmbH Germany, part of Springer Nature 2020</rights><rights>Springer-Verlag GmbH Germany, part of Springer Nature 2020.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c385t-8126dc34ea818e9046a83c885122c47faf7f26aef41e81357ad0599d6b3b36fb3</citedby><cites>FETCH-LOGICAL-c385t-8126dc34ea818e9046a83c885122c47faf7f26aef41e81357ad0599d6b3b36fb3</cites><orcidid>0000-0002-0068-8824</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s11554-020-00977-w$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/2918675736?pq-origsite=primo$$EHTML$$P50$$Gproquest$$H</linktohtml><link.rule.ids>314,776,780,21367,27901,27902,33721,41464,42533,43781,51294</link.rule.ids></links><search><creatorcontrib>Xu, Ke</creatorcontrib><creatorcontrib>Wang, Xiaoyun</creatorcontrib><creatorcontrib>Liu, Xinyang</creatorcontrib><creatorcontrib>Cao, Changfeng</creatorcontrib><creatorcontrib>Li, Huolin</creatorcontrib><creatorcontrib>Peng, Haiyong</creatorcontrib><creatorcontrib>Wang, Dong</creatorcontrib><title>A dedicated hardware accelerator for real-time acceleration of YOLOv2</title><title>Journal of real-time image processing</title><addtitle>J Real-Time Image Proc</addtitle><description>In recent years, dedicated hardware accelerators for the acceleration of the convolutional neural network (CNN) have been extensively studied. Although many studies have presented efficient designs on FPGAs for image classification neural network models such as AlexNet and VGG, there are still little implementations for CNN-based object detection applications. This paper presents an OpenCL-based high-throughput FPGA accelerator for the YOLOv2 object detection algorithm on Arria-10 GX1150 FPGA. The proposed hardware architecture adopts a scalable pipeline design to support multi-resolution input image and full 8-bit fixed-point datapath to improve hardware resource utilization. Layer fusion technology that merges the convolution, batch normalization and Leaky-ReLU is also developed to avoid transmission of intermediate data between FPGA and external memory. Experimental results show that the final design achieves a peak throughput of 566 GOP/s under the working frequency of 190 MHz. The accelerator can execute YOLOv2 inference computation ( 288 × 288 resolution) and tiny YOLOv2 ( 416 × 416 resolution) at the speed of 35 and 71 FPS, respectively.</description><subject>Acceleration</subject><subject>Accuracy</subject><subject>Algorithms</subject><subject>Artificial neural networks</subject><subject>Bandwidths</subject><subject>Circuits</subject><subject>Computer Graphics</subject><subject>Computer Science</subject><subject>Design</subject><subject>Field programmable gate arrays</subject><subject>Hardware</subject><subject>Image classification</subject><subject>Image Processing and Computer Vision</subject><subject>Multimedia Information Systems</subject><subject>Neural networks</subject><subject>Object recognition</subject><subject>Original Research Paper</subject><subject>Pattern Recognition</subject><subject>Performance evaluation</subject><subject>Pipeline design</subject><subject>Power</subject><subject>Resource utilization</subject><subject>Signal,Image and Speech Processing</subject><subject>Workloads</subject><issn>1861-8200</issn><issn>1861-8219</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNp9kE1LAzEQhoMoWKt_wNOC5-gk2XzssRS_oNCLHjyFNJnolrZbk63Ff290RT15GGZg3vcd5iHknMElA9BXmTEpawocKECjNd0fkBEzilHDWXP4MwMck5OclwBKKyFH5HpSBQytdz2G6sWlsHcJK-c9rjC5vktVLJXQrWjfrv9s2m5TdbF6ms_mb_yUHEW3ynj23cfk8eb6YXpHZ_Pb--lkRr0wsqeGcRW8qNEZZrCBWjkjvDGSce5rHV3UkSuHsWZomJDaBZBNE9RCLISKCzEmF0PuNnWvO8y9XXa7tCknLW_Ki1pqoYqKDyqfupwTRrtN7dqld8vAfuKyAy5bcNkvXHZfTGIw5SLePGP6jf7H9QHQK2y9</recordid><startdate>20210601</startdate><enddate>20210601</enddate><creator>Xu, Ke</creator><creator>Wang, Xiaoyun</creator><creator>Liu, Xinyang</creator><creator>Cao, Changfeng</creator><creator>Li, Huolin</creator><creator>Peng, Haiyong</creator><creator>Wang, Dong</creator><general>Springer Berlin Heidelberg</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>8FE</scope><scope>8FG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>P5Z</scope><scope>P62</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><orcidid>https://orcid.org/0000-0002-0068-8824</orcidid></search><sort><creationdate>20210601</creationdate><title>A dedicated hardware accelerator for real-time acceleration of YOLOv2</title><author>Xu, Ke ; Wang, Xiaoyun ; Liu, Xinyang ; Cao, Changfeng ; Li, Huolin ; Peng, Haiyong ; Wang, Dong</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c385t-8126dc34ea818e9046a83c885122c47faf7f26aef41e81357ad0599d6b3b36fb3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Acceleration</topic><topic>Accuracy</topic><topic>Algorithms</topic><topic>Artificial neural networks</topic><topic>Bandwidths</topic><topic>Circuits</topic><topic>Computer Graphics</topic><topic>Computer Science</topic><topic>Design</topic><topic>Field programmable gate arrays</topic><topic>Hardware</topic><topic>Image classification</topic><topic>Image Processing and Computer Vision</topic><topic>Multimedia Information Systems</topic><topic>Neural networks</topic><topic>Object recognition</topic><topic>Original Research Paper</topic><topic>Pattern Recognition</topic><topic>Performance evaluation</topic><topic>Pipeline design</topic><topic>Power</topic><topic>Resource utilization</topic><topic>Signal,Image and Speech Processing</topic><topic>Workloads</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Xu, Ke</creatorcontrib><creatorcontrib>Wang, Xiaoyun</creatorcontrib><creatorcontrib>Liu, Xinyang</creatorcontrib><creatorcontrib>Cao, Changfeng</creatorcontrib><creatorcontrib>Li, Huolin</creatorcontrib><creatorcontrib>Peng, Haiyong</creatorcontrib><creatorcontrib>Wang, Dong</creatorcontrib><collection>CrossRef</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection (ProQuest)</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>Advanced Technologies &amp; Aerospace Database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><jtitle>Journal of real-time image processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Xu, Ke</au><au>Wang, Xiaoyun</au><au>Liu, Xinyang</au><au>Cao, Changfeng</au><au>Li, Huolin</au><au>Peng, Haiyong</au><au>Wang, Dong</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A dedicated hardware accelerator for real-time acceleration of YOLOv2</atitle><jtitle>Journal of real-time image processing</jtitle><stitle>J Real-Time Image Proc</stitle><date>2021-06-01</date><risdate>2021</risdate><volume>18</volume><issue>3</issue><spage>481</spage><epage>492</epage><pages>481-492</pages><issn>1861-8200</issn><eissn>1861-8219</eissn><abstract>In recent years, dedicated hardware accelerators for the acceleration of the convolutional neural network (CNN) have been extensively studied. Although many studies have presented efficient designs on FPGAs for image classification neural network models such as AlexNet and VGG, there are still little implementations for CNN-based object detection applications. This paper presents an OpenCL-based high-throughput FPGA accelerator for the YOLOv2 object detection algorithm on Arria-10 GX1150 FPGA. The proposed hardware architecture adopts a scalable pipeline design to support multi-resolution input image and full 8-bit fixed-point datapath to improve hardware resource utilization. Layer fusion technology that merges the convolution, batch normalization and Leaky-ReLU is also developed to avoid transmission of intermediate data between FPGA and external memory. Experimental results show that the final design achieves a peak throughput of 566 GOP/s under the working frequency of 190 MHz. The accelerator can execute YOLOv2 inference computation ( 288 × 288 resolution) and tiny YOLOv2 ( 416 × 416 resolution) at the speed of 35 and 71 FPS, respectively.</abstract><cop>Berlin/Heidelberg</cop><pub>Springer Berlin Heidelberg</pub><doi>10.1007/s11554-020-00977-w</doi><tpages>12</tpages><orcidid>https://orcid.org/0000-0002-0068-8824</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 1861-8200
ispartof Journal of real-time image processing, 2021-06, Vol.18 (3), p.481-492
issn 1861-8200
1861-8219
language eng
recordid cdi_proquest_journals_2918675736
source SpringerLink Journals; ProQuest Central
subjects Acceleration
Accuracy
Algorithms
Artificial neural networks
Bandwidths
Circuits
Computer Graphics
Computer Science
Design
Field programmable gate arrays
Hardware
Image classification
Image Processing and Computer Vision
Multimedia Information Systems
Neural networks
Object recognition
Original Research Paper
Pattern Recognition
Performance evaluation
Pipeline design
Power
Resource utilization
Signal,Image and Speech Processing
Workloads
title A dedicated hardware accelerator for real-time acceleration of YOLOv2
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-08T01%3A26%3A40IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20dedicated%20hardware%20accelerator%20for%20real-time%20acceleration%20of%20YOLOv2&rft.jtitle=Journal%20of%20real-time%20image%20processing&rft.au=Xu,%20Ke&rft.date=2021-06-01&rft.volume=18&rft.issue=3&rft.spage=481&rft.epage=492&rft.pages=481-492&rft.issn=1861-8200&rft.eissn=1861-8219&rft_id=info:doi/10.1007/s11554-020-00977-w&rft_dat=%3Cproquest_cross%3E2918675736%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2918675736&rft_id=info:pmid/&rfr_iscdi=true