ARC 2014: Towards a Fast FPGA Implementation of a Heap-Based Priority Queue for Image Coding Using a Parallel Index-Aware Tree

The embedded image processing systems like smartphones and digital cameras have tight limits on storage, computation power, network connectivity, and battery usage. These limitations make it important to ensure efficient image coding. In the article, we present a novel heap-based priority queue stru...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:ACM transactions on reconfigurable technology and systems 2015-11, Vol.9 (1), p.1-16
Hauptverfasser: Bai, Yuhui, Ahmed, Syed Zahid, Granado, Bertrand
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 16
container_issue 1
container_start_page 1
container_title ACM transactions on reconfigurable technology and systems
container_volume 9
creator Bai, Yuhui
Ahmed, Syed Zahid
Granado, Bertrand
description The embedded image processing systems like smartphones and digital cameras have tight limits on storage, computation power, network connectivity, and battery usage. These limitations make it important to ensure efficient image coding. In the article, we present a novel heap-based priority queue structure employed by an Adaptive Scanning of Wavelet Data scheme (ASWD) targeting an embedded platform. ASWD is a context modeling block implemented via priority queues in a wavelet-based image coder to reorganize the wavelet coefficients into locally stationary sequences. The architecture we propose exploits efficient use of FPGA’s on-chip dual-port memories in an adaptive manner. Innovations of index-aware system linked to each element in the queue makes the location of queue element traceable in the heap as per the requirements of the ASWD algorithm. Moreover, use of 4-port memories along with intelligent data concatenation of queue elements yielded in a cost effective enhanced memory access. The memory ports are adaptively assigned to different units during different processing phases in a manner to optimally take advantage of memory access required by that phase. The architectural innovations can also be exploited in other applications that require efficient hardware implementations of generic priority queue or classical sorting applications which sort into the index. We designed and validated the hardware on an Altera’s Stratix IV FPGA as an IP accelerator in a Nios II processor based System on Chip. We show that our architecture at 150MHz can provide 45X speedup compared to an embedded ARM Cortex-A9 processor at 666MHz targeting the throughput of 10MB/s.
doi_str_mv 10.1145/2766454
format Article
fullrecord <record><control><sourceid>hal_cross</sourceid><recordid>TN_cdi_hal_primary_oai_HAL_hal_01534260v1</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>oai_HAL_hal_01534260v1</sourcerecordid><originalsourceid>FETCH-LOGICAL-c221t-40a179bbe0af855be2d9a4fa4ad0ad0e3ce7e5781ba4446f66462fa13861be683</originalsourceid><addsrcrecordid>eNo9j0FLAzEQhYNYaG3Fv-BNPKzOJJNJ9rgsrRUWBNFzmGwTrFQqGxH8925pKTyYx-ObYZ5SNwgPiGQftWMmSxdqhrXhyhHS5dkDT9VVKZ8AbNjTTE2b1_ZWA9JCTbLsSro-zbl6Xy3f2nXVvTw9t01X9VrjT0Ug6OoYE0j21sakN7VQFpINjEqmTy5Z5zEKEXEef2GdBY1njIm9mav7490P2YXvYfslw1_Yyzasmy4cMkBrSDP84sjeHdl-2JcypHxeQAiHsuFU1vwDhJJBoA</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>ARC 2014: Towards a Fast FPGA Implementation of a Heap-Based Priority Queue for Image Coding Using a Parallel Index-Aware Tree</title><source>ACM Digital Library Complete</source><creator>Bai, Yuhui ; Ahmed, Syed Zahid ; Granado, Bertrand</creator><creatorcontrib>Bai, Yuhui ; Ahmed, Syed Zahid ; Granado, Bertrand</creatorcontrib><description>The embedded image processing systems like smartphones and digital cameras have tight limits on storage, computation power, network connectivity, and battery usage. These limitations make it important to ensure efficient image coding. In the article, we present a novel heap-based priority queue structure employed by an Adaptive Scanning of Wavelet Data scheme (ASWD) targeting an embedded platform. ASWD is a context modeling block implemented via priority queues in a wavelet-based image coder to reorganize the wavelet coefficients into locally stationary sequences. The architecture we propose exploits efficient use of FPGA’s on-chip dual-port memories in an adaptive manner. Innovations of index-aware system linked to each element in the queue makes the location of queue element traceable in the heap as per the requirements of the ASWD algorithm. Moreover, use of 4-port memories along with intelligent data concatenation of queue elements yielded in a cost effective enhanced memory access. The memory ports are adaptively assigned to different units during different processing phases in a manner to optimally take advantage of memory access required by that phase. The architectural innovations can also be exploited in other applications that require efficient hardware implementations of generic priority queue or classical sorting applications which sort into the index. We designed and validated the hardware on an Altera’s Stratix IV FPGA as an IP accelerator in a Nios II processor based System on Chip. We show that our architecture at 150MHz can provide 45X speedup compared to an embedded ARM Cortex-A9 processor at 666MHz targeting the throughput of 10MB/s.</description><identifier>ISSN: 1936-7406</identifier><identifier>EISSN: 1936-7414</identifier><identifier>DOI: 10.1145/2766454</identifier><language>eng</language><publisher>ACM</publisher><subject>Computer Science ; Electronics ; Engineering Sciences ; Hardware Architecture</subject><ispartof>ACM transactions on reconfigurable technology and systems, 2015-11, Vol.9 (1), p.1-16</ispartof><rights>Distributed under a Creative Commons Attribution 4.0 International License</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c221t-40a179bbe0af855be2d9a4fa4ad0ad0e3ce7e5781ba4446f66462fa13861be683</cites><orcidid>0000-0002-9667-9737</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>230,314,780,784,885,27924,27925</link.rule.ids><backlink>$$Uhttps://hal.science/hal-01534260$$DView record in HAL$$Hfree_for_read</backlink></links><search><creatorcontrib>Bai, Yuhui</creatorcontrib><creatorcontrib>Ahmed, Syed Zahid</creatorcontrib><creatorcontrib>Granado, Bertrand</creatorcontrib><title>ARC 2014: Towards a Fast FPGA Implementation of a Heap-Based Priority Queue for Image Coding Using a Parallel Index-Aware Tree</title><title>ACM transactions on reconfigurable technology and systems</title><description>The embedded image processing systems like smartphones and digital cameras have tight limits on storage, computation power, network connectivity, and battery usage. These limitations make it important to ensure efficient image coding. In the article, we present a novel heap-based priority queue structure employed by an Adaptive Scanning of Wavelet Data scheme (ASWD) targeting an embedded platform. ASWD is a context modeling block implemented via priority queues in a wavelet-based image coder to reorganize the wavelet coefficients into locally stationary sequences. The architecture we propose exploits efficient use of FPGA’s on-chip dual-port memories in an adaptive manner. Innovations of index-aware system linked to each element in the queue makes the location of queue element traceable in the heap as per the requirements of the ASWD algorithm. Moreover, use of 4-port memories along with intelligent data concatenation of queue elements yielded in a cost effective enhanced memory access. The memory ports are adaptively assigned to different units during different processing phases in a manner to optimally take advantage of memory access required by that phase. The architectural innovations can also be exploited in other applications that require efficient hardware implementations of generic priority queue or classical sorting applications which sort into the index. We designed and validated the hardware on an Altera’s Stratix IV FPGA as an IP accelerator in a Nios II processor based System on Chip. We show that our architecture at 150MHz can provide 45X speedup compared to an embedded ARM Cortex-A9 processor at 666MHz targeting the throughput of 10MB/s.</description><subject>Computer Science</subject><subject>Electronics</subject><subject>Engineering Sciences</subject><subject>Hardware Architecture</subject><issn>1936-7406</issn><issn>1936-7414</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2015</creationdate><recordtype>article</recordtype><recordid>eNo9j0FLAzEQhYNYaG3Fv-BNPKzOJJNJ9rgsrRUWBNFzmGwTrFQqGxH8925pKTyYx-ObYZ5SNwgPiGQftWMmSxdqhrXhyhHS5dkDT9VVKZ8AbNjTTE2b1_ZWA9JCTbLsSro-zbl6Xy3f2nXVvTw9t01X9VrjT0Ug6OoYE0j21sakN7VQFpINjEqmTy5Z5zEKEXEef2GdBY1njIm9mav7490P2YXvYfslw1_Yyzasmy4cMkBrSDP84sjeHdl-2JcypHxeQAiHsuFU1vwDhJJBoA</recordid><startdate>20151101</startdate><enddate>20151101</enddate><creator>Bai, Yuhui</creator><creator>Ahmed, Syed Zahid</creator><creator>Granado, Bertrand</creator><general>ACM</general><scope>AAYXX</scope><scope>CITATION</scope><scope>1XC</scope><orcidid>https://orcid.org/0000-0002-9667-9737</orcidid></search><sort><creationdate>20151101</creationdate><title>ARC 2014</title><author>Bai, Yuhui ; Ahmed, Syed Zahid ; Granado, Bertrand</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c221t-40a179bbe0af855be2d9a4fa4ad0ad0e3ce7e5781ba4446f66462fa13861be683</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2015</creationdate><topic>Computer Science</topic><topic>Electronics</topic><topic>Engineering Sciences</topic><topic>Hardware Architecture</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Bai, Yuhui</creatorcontrib><creatorcontrib>Ahmed, Syed Zahid</creatorcontrib><creatorcontrib>Granado, Bertrand</creatorcontrib><collection>CrossRef</collection><collection>Hyper Article en Ligne (HAL)</collection><jtitle>ACM transactions on reconfigurable technology and systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Bai, Yuhui</au><au>Ahmed, Syed Zahid</au><au>Granado, Bertrand</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>ARC 2014: Towards a Fast FPGA Implementation of a Heap-Based Priority Queue for Image Coding Using a Parallel Index-Aware Tree</atitle><jtitle>ACM transactions on reconfigurable technology and systems</jtitle><date>2015-11-01</date><risdate>2015</risdate><volume>9</volume><issue>1</issue><spage>1</spage><epage>16</epage><pages>1-16</pages><issn>1936-7406</issn><eissn>1936-7414</eissn><abstract>The embedded image processing systems like smartphones and digital cameras have tight limits on storage, computation power, network connectivity, and battery usage. These limitations make it important to ensure efficient image coding. In the article, we present a novel heap-based priority queue structure employed by an Adaptive Scanning of Wavelet Data scheme (ASWD) targeting an embedded platform. ASWD is a context modeling block implemented via priority queues in a wavelet-based image coder to reorganize the wavelet coefficients into locally stationary sequences. The architecture we propose exploits efficient use of FPGA’s on-chip dual-port memories in an adaptive manner. Innovations of index-aware system linked to each element in the queue makes the location of queue element traceable in the heap as per the requirements of the ASWD algorithm. Moreover, use of 4-port memories along with intelligent data concatenation of queue elements yielded in a cost effective enhanced memory access. The memory ports are adaptively assigned to different units during different processing phases in a manner to optimally take advantage of memory access required by that phase. The architectural innovations can also be exploited in other applications that require efficient hardware implementations of generic priority queue or classical sorting applications which sort into the index. We designed and validated the hardware on an Altera’s Stratix IV FPGA as an IP accelerator in a Nios II processor based System on Chip. We show that our architecture at 150MHz can provide 45X speedup compared to an embedded ARM Cortex-A9 processor at 666MHz targeting the throughput of 10MB/s.</abstract><pub>ACM</pub><doi>10.1145/2766454</doi><tpages>16</tpages><orcidid>https://orcid.org/0000-0002-9667-9737</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 1936-7406
ispartof ACM transactions on reconfigurable technology and systems, 2015-11, Vol.9 (1), p.1-16
issn 1936-7406
1936-7414
language eng
recordid cdi_hal_primary_oai_HAL_hal_01534260v1
source ACM Digital Library Complete
subjects Computer Science
Electronics
Engineering Sciences
Hardware Architecture
title ARC 2014: Towards a Fast FPGA Implementation of a Heap-Based Priority Queue for Image Coding Using a Parallel Index-Aware Tree
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T11%3A34%3A06IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-hal_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=ARC%202014:%20Towards%20a%20Fast%20FPGA%20Implementation%20of%20a%20Heap-Based%20Priority%20Queue%20for%20Image%20Coding%20Using%20a%20Parallel%20Index-Aware%20Tree&rft.jtitle=ACM%20transactions%20on%20reconfigurable%20technology%20and%20systems&rft.au=Bai,%20Yuhui&rft.date=2015-11-01&rft.volume=9&rft.issue=1&rft.spage=1&rft.epage=16&rft.pages=1-16&rft.issn=1936-7406&rft.eissn=1936-7414&rft_id=info:doi/10.1145/2766454&rft_dat=%3Chal_cross%3Eoai_HAL_hal_01534260v1%3C/hal_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true