ARC 2014: Towards a Fast FPGA Implementation of a Heap-Based Priority Queue for Image Coding Using a Parallel Index-Aware Tree
The embedded image processing systems like smartphones and digital cameras have tight limits on storage, computation power, network connectivity, and battery usage. These limitations make it important to ensure efficient image coding. In the article, we present a novel heap-based priority queue stru...
Gespeichert in:
Veröffentlicht in: | ACM transactions on reconfigurable technology and systems 2015-11, Vol.9 (1), p.1-16 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 16 |
---|---|
container_issue | 1 |
container_start_page | 1 |
container_title | ACM transactions on reconfigurable technology and systems |
container_volume | 9 |
creator | Bai, Yuhui Ahmed, Syed Zahid Granado, Bertrand |
description | The embedded image processing systems like smartphones and digital cameras have tight limits on storage, computation power, network connectivity, and battery usage. These limitations make it important to ensure efficient image coding. In the article, we present a novel heap-based priority queue structure employed by an Adaptive Scanning of Wavelet Data scheme (ASWD) targeting an embedded platform. ASWD is a context modeling block implemented via priority queues in a wavelet-based image coder to reorganize the wavelet coefficients into locally stationary sequences. The architecture we propose exploits efficient use of FPGA’s on-chip dual-port memories in an adaptive manner. Innovations of index-aware system linked to each element in the queue makes the location of queue element traceable in the heap as per the requirements of the ASWD algorithm. Moreover, use of 4-port memories along with intelligent data concatenation of queue elements yielded in a cost effective enhanced memory access. The memory ports are adaptively assigned to different units during different processing phases in a manner to optimally take advantage of memory access required by that phase. The architectural innovations can also be exploited in other applications that require efficient hardware implementations of generic priority queue or classical sorting applications which sort into the index. We designed and validated the hardware on an Altera’s Stratix IV FPGA as an IP accelerator in a Nios II processor based System on Chip. We show that our architecture at 150MHz can provide 45X speedup compared to an embedded ARM Cortex-A9 processor at 666MHz targeting the throughput of 10MB/s. |
doi_str_mv | 10.1145/2766454 |
format | Article |
fullrecord | <record><control><sourceid>hal_cross</sourceid><recordid>TN_cdi_hal_primary_oai_HAL_hal_01534260v1</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>oai_HAL_hal_01534260v1</sourcerecordid><originalsourceid>FETCH-LOGICAL-c221t-40a179bbe0af855be2d9a4fa4ad0ad0e3ce7e5781ba4446f66462fa13861be683</originalsourceid><addsrcrecordid>eNo9j0FLAzEQhYNYaG3Fv-BNPKzOJJNJ9rgsrRUWBNFzmGwTrFQqGxH8925pKTyYx-ObYZ5SNwgPiGQftWMmSxdqhrXhyhHS5dkDT9VVKZ8AbNjTTE2b1_ZWA9JCTbLsSro-zbl6Xy3f2nXVvTw9t01X9VrjT0Ug6OoYE0j21sakN7VQFpINjEqmTy5Z5zEKEXEef2GdBY1njIm9mav7490P2YXvYfslw1_Yyzasmy4cMkBrSDP84sjeHdl-2JcypHxeQAiHsuFU1vwDhJJBoA</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>ARC 2014: Towards a Fast FPGA Implementation of a Heap-Based Priority Queue for Image Coding Using a Parallel Index-Aware Tree</title><source>ACM Digital Library Complete</source><creator>Bai, Yuhui ; Ahmed, Syed Zahid ; Granado, Bertrand</creator><creatorcontrib>Bai, Yuhui ; Ahmed, Syed Zahid ; Granado, Bertrand</creatorcontrib><description>The embedded image processing systems like smartphones and digital cameras have tight limits on storage, computation power, network connectivity, and battery usage. These limitations make it important to ensure efficient image coding. In the article, we present a novel heap-based priority queue structure employed by an Adaptive Scanning of Wavelet Data scheme (ASWD) targeting an embedded platform. ASWD is a context modeling block implemented via priority queues in a wavelet-based image coder to reorganize the wavelet coefficients into locally stationary sequences. The architecture we propose exploits efficient use of FPGA’s on-chip dual-port memories in an adaptive manner. Innovations of index-aware system linked to each element in the queue makes the location of queue element traceable in the heap as per the requirements of the ASWD algorithm. Moreover, use of 4-port memories along with intelligent data concatenation of queue elements yielded in a cost effective enhanced memory access. The memory ports are adaptively assigned to different units during different processing phases in a manner to optimally take advantage of memory access required by that phase. The architectural innovations can also be exploited in other applications that require efficient hardware implementations of generic priority queue or classical sorting applications which sort into the index. We designed and validated the hardware on an Altera’s Stratix IV FPGA as an IP accelerator in a Nios II processor based System on Chip. We show that our architecture at 150MHz can provide 45X speedup compared to an embedded ARM Cortex-A9 processor at 666MHz targeting the throughput of 10MB/s.</description><identifier>ISSN: 1936-7406</identifier><identifier>EISSN: 1936-7414</identifier><identifier>DOI: 10.1145/2766454</identifier><language>eng</language><publisher>ACM</publisher><subject>Computer Science ; Electronics ; Engineering Sciences ; Hardware Architecture</subject><ispartof>ACM transactions on reconfigurable technology and systems, 2015-11, Vol.9 (1), p.1-16</ispartof><rights>Distributed under a Creative Commons Attribution 4.0 International License</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c221t-40a179bbe0af855be2d9a4fa4ad0ad0e3ce7e5781ba4446f66462fa13861be683</cites><orcidid>0000-0002-9667-9737</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>230,314,780,784,885,27924,27925</link.rule.ids><backlink>$$Uhttps://hal.science/hal-01534260$$DView record in HAL$$Hfree_for_read</backlink></links><search><creatorcontrib>Bai, Yuhui</creatorcontrib><creatorcontrib>Ahmed, Syed Zahid</creatorcontrib><creatorcontrib>Granado, Bertrand</creatorcontrib><title>ARC 2014: Towards a Fast FPGA Implementation of a Heap-Based Priority Queue for Image Coding Using a Parallel Index-Aware Tree</title><title>ACM transactions on reconfigurable technology and systems</title><description>The embedded image processing systems like smartphones and digital cameras have tight limits on storage, computation power, network connectivity, and battery usage. These limitations make it important to ensure efficient image coding. In the article, we present a novel heap-based priority queue structure employed by an Adaptive Scanning of Wavelet Data scheme (ASWD) targeting an embedded platform. ASWD is a context modeling block implemented via priority queues in a wavelet-based image coder to reorganize the wavelet coefficients into locally stationary sequences. The architecture we propose exploits efficient use of FPGA’s on-chip dual-port memories in an adaptive manner. Innovations of index-aware system linked to each element in the queue makes the location of queue element traceable in the heap as per the requirements of the ASWD algorithm. Moreover, use of 4-port memories along with intelligent data concatenation of queue elements yielded in a cost effective enhanced memory access. The memory ports are adaptively assigned to different units during different processing phases in a manner to optimally take advantage of memory access required by that phase. The architectural innovations can also be exploited in other applications that require efficient hardware implementations of generic priority queue or classical sorting applications which sort into the index. We designed and validated the hardware on an Altera’s Stratix IV FPGA as an IP accelerator in a Nios II processor based System on Chip. We show that our architecture at 150MHz can provide 45X speedup compared to an embedded ARM Cortex-A9 processor at 666MHz targeting the throughput of 10MB/s.</description><subject>Computer Science</subject><subject>Electronics</subject><subject>Engineering Sciences</subject><subject>Hardware Architecture</subject><issn>1936-7406</issn><issn>1936-7414</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2015</creationdate><recordtype>article</recordtype><recordid>eNo9j0FLAzEQhYNYaG3Fv-BNPKzOJJNJ9rgsrRUWBNFzmGwTrFQqGxH8925pKTyYx-ObYZ5SNwgPiGQftWMmSxdqhrXhyhHS5dkDT9VVKZ8AbNjTTE2b1_ZWA9JCTbLsSro-zbl6Xy3f2nXVvTw9t01X9VrjT0Ug6OoYE0j21sakN7VQFpINjEqmTy5Z5zEKEXEef2GdBY1njIm9mav7490P2YXvYfslw1_Yyzasmy4cMkBrSDP84sjeHdl-2JcypHxeQAiHsuFU1vwDhJJBoA</recordid><startdate>20151101</startdate><enddate>20151101</enddate><creator>Bai, Yuhui</creator><creator>Ahmed, Syed Zahid</creator><creator>Granado, Bertrand</creator><general>ACM</general><scope>AAYXX</scope><scope>CITATION</scope><scope>1XC</scope><orcidid>https://orcid.org/0000-0002-9667-9737</orcidid></search><sort><creationdate>20151101</creationdate><title>ARC 2014</title><author>Bai, Yuhui ; Ahmed, Syed Zahid ; Granado, Bertrand</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c221t-40a179bbe0af855be2d9a4fa4ad0ad0e3ce7e5781ba4446f66462fa13861be683</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2015</creationdate><topic>Computer Science</topic><topic>Electronics</topic><topic>Engineering Sciences</topic><topic>Hardware Architecture</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Bai, Yuhui</creatorcontrib><creatorcontrib>Ahmed, Syed Zahid</creatorcontrib><creatorcontrib>Granado, Bertrand</creatorcontrib><collection>CrossRef</collection><collection>Hyper Article en Ligne (HAL)</collection><jtitle>ACM transactions on reconfigurable technology and systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Bai, Yuhui</au><au>Ahmed, Syed Zahid</au><au>Granado, Bertrand</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>ARC 2014: Towards a Fast FPGA Implementation of a Heap-Based Priority Queue for Image Coding Using a Parallel Index-Aware Tree</atitle><jtitle>ACM transactions on reconfigurable technology and systems</jtitle><date>2015-11-01</date><risdate>2015</risdate><volume>9</volume><issue>1</issue><spage>1</spage><epage>16</epage><pages>1-16</pages><issn>1936-7406</issn><eissn>1936-7414</eissn><abstract>The embedded image processing systems like smartphones and digital cameras have tight limits on storage, computation power, network connectivity, and battery usage. These limitations make it important to ensure efficient image coding. In the article, we present a novel heap-based priority queue structure employed by an Adaptive Scanning of Wavelet Data scheme (ASWD) targeting an embedded platform. ASWD is a context modeling block implemented via priority queues in a wavelet-based image coder to reorganize the wavelet coefficients into locally stationary sequences. The architecture we propose exploits efficient use of FPGA’s on-chip dual-port memories in an adaptive manner. Innovations of index-aware system linked to each element in the queue makes the location of queue element traceable in the heap as per the requirements of the ASWD algorithm. Moreover, use of 4-port memories along with intelligent data concatenation of queue elements yielded in a cost effective enhanced memory access. The memory ports are adaptively assigned to different units during different processing phases in a manner to optimally take advantage of memory access required by that phase. The architectural innovations can also be exploited in other applications that require efficient hardware implementations of generic priority queue or classical sorting applications which sort into the index. We designed and validated the hardware on an Altera’s Stratix IV FPGA as an IP accelerator in a Nios II processor based System on Chip. We show that our architecture at 150MHz can provide 45X speedup compared to an embedded ARM Cortex-A9 processor at 666MHz targeting the throughput of 10MB/s.</abstract><pub>ACM</pub><doi>10.1145/2766454</doi><tpages>16</tpages><orcidid>https://orcid.org/0000-0002-9667-9737</orcidid></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1936-7406 |
ispartof | ACM transactions on reconfigurable technology and systems, 2015-11, Vol.9 (1), p.1-16 |
issn | 1936-7406 1936-7414 |
language | eng |
recordid | cdi_hal_primary_oai_HAL_hal_01534260v1 |
source | ACM Digital Library Complete |
subjects | Computer Science Electronics Engineering Sciences Hardware Architecture |
title | ARC 2014: Towards a Fast FPGA Implementation of a Heap-Based Priority Queue for Image Coding Using a Parallel Index-Aware Tree |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T11%3A34%3A06IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-hal_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=ARC%202014:%20Towards%20a%20Fast%20FPGA%20Implementation%20of%20a%20Heap-Based%20Priority%20Queue%20for%20Image%20Coding%20Using%20a%20Parallel%20Index-Aware%20Tree&rft.jtitle=ACM%20transactions%20on%20reconfigurable%20technology%20and%20systems&rft.au=Bai,%20Yuhui&rft.date=2015-11-01&rft.volume=9&rft.issue=1&rft.spage=1&rft.epage=16&rft.pages=1-16&rft.issn=1936-7406&rft.eissn=1936-7414&rft_id=info:doi/10.1145/2766454&rft_dat=%3Chal_cross%3Eoai_HAL_hal_01534260v1%3C/hal_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |