ARC 2014: Towards a Fast FPGA Implementation of a Heap-Based Priority Queue for Image Coding Using a Parallel Index-Aware Tree

The embedded image processing systems like smartphones and digital cameras have tight limits on storage, computation power, network connectivity, and battery usage. These limitations make it important to ensure efficient image coding. In the article, we present a novel heap-based priority queue stru...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	ACM transactions on reconfigurable technology and systems 2015-11, Vol.9 (1), p.1-16
Hauptverfasser:	Bai, Yuhui, Ahmed, Syed Zahid, Granado, Bertrand
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science Electronics Engineering Sciences Hardware Architecture
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	16
container_issue	1
container_start_page	1
container_title	ACM transactions on reconfigurable technology and systems
container_volume	9
creator	Bai, Yuhui Ahmed, Syed Zahid Granado, Bertrand
description	The embedded image processing systems like smartphones and digital cameras have tight limits on storage, computation power, network connectivity, and battery usage. These limitations make it important to ensure efficient image coding. In the article, we present a novel heap-based priority queue structure employed by an Adaptive Scanning of Wavelet Data scheme (ASWD) targeting an embedded platform. ASWD is a context modeling block implemented via priority queues in a wavelet-based image coder to reorganize the wavelet coefficients into locally stationary sequences. The architecture we propose exploits efficient use of FPGA’s on-chip dual-port memories in an adaptive manner. Innovations of index-aware system linked to each element in the queue makes the location of queue element traceable in the heap as per the requirements of the ASWD algorithm. Moreover, use of 4-port memories along with intelligent data concatenation of queue elements yielded in a cost effective enhanced memory access. The memory ports are adaptively assigned to different units during different processing phases in a manner to optimally take advantage of memory access required by that phase. The architectural innovations can also be exploited in other applications that require efficient hardware implementations of generic priority queue or classical sorting applications which sort into the index. We designed and validated the hardware on an Altera’s Stratix IV FPGA as an IP accelerator in a Nios II processor based System on Chip. We show that our architecture at 150MHz can provide 45X speedup compared to an embedded ARM Cortex-A9 processor at 666MHz targeting the throughput of 10MB/s.
doi_str_mv	10.1145/2766454
format	Article
fullrecord	<record><control><sourceid>hal_cross</sourceid><recordid>TN_cdi_hal_primary_oai_HAL_hal_01534260v1</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>oai_HAL_hal_01534260v1</sourcerecordid><originalsourceid>FETCH-LOGICAL-c221t-40a179bbe0af855be2d9a4fa4ad0ad0e3ce7e5781ba4446f66462fa13861be683</originalsourceid><addsrcrecordid>eNo9j0FLAzEQhYNYaG3Fv-BNPKzOJJNJ9rgsrRUWBNFzmGwTrFQqGxH8925pKTyYx-ObYZ5SNwgPiGQftWMmSxdqhrXhyhHS5dkDT9VVKZ8AbNjTTE2b1_ZWA9JCTbLsSro-zbl6Xy3f2nXVvTw9t01X9VrjT0Ug6OoYE0j21sakN7VQFpINjEqmTy5Z5zEKEXEef2GdBY1njIm9mav7490P2YXvYfslw1_Yyzasmy4cMkBrSDP84sjeHdl-2JcypHxeQAiHsuFU1vwDhJJBoA</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>ARC 2014: Towards a Fast FPGA Implementation of a Heap-Based Priority Queue for Image Coding Using a Parallel Index-Aware Tree</title><source>ACM Digital Library Complete</source><creator>Bai, Yuhui ; Ahmed, Syed Zahid ; Granado, Bertrand</creator><creatorcontrib>Bai, Yuhui ; Ahmed, Syed Zahid ; Granado, Bertrand</creatorcontrib><description>The embedded image processing systems like smartphones and digital cameras have tight limits on storage, computation power, network connectivity, and battery usage. These limitations make it important to ensure efficient image coding. In the article, we present a novel heap-based priority queue structure employed by an Adaptive Scanning of Wavelet Data scheme (ASWD) targeting an embedded platform. ASWD is a context modeling block implemented via priority queues in a wavelet-based image coder to reorganize the wavelet coefficients into locally stationary sequences. The architecture we propose exploits efficient use of FPGA’s on-chip dual-port memories in an adaptive manner. Innovations of index-aware system linked to each element in the queue makes the location of queue element traceable in the heap as per the requirements of the ASWD algorithm. Moreover, use of 4-port memories along with intelligent data concatenation of queue elements yielded in a cost effective enhanced memory access. The memory ports are adaptively assigned to different units during different processing phases in a manner to optimally take advantage of memory access required by that phase. The architectural innovations can also be exploited in other applications that require efficient hardware implementations of generic priority queue or classical sorting applications which sort into the index. We designed and validated the hardware on an Altera’s Stratix IV FPGA as an IP accelerator in a Nios II processor based System on Chip. We show that our architecture at 150MHz can provide 45X speedup compared to an embedded ARM Cortex-A9 processor at 666MHz targeting the throughput of 10MB/s.</description><identifier>ISSN: 1936-7406</identifier><identifier>EISSN: 1936-7414</identifier><identifier>DOI: 10.1145/2766454</identifier><language>eng</language><publisher>ACM</publisher><subject>Computer Science ; Electronics ; Engineering Sciences ; Hardware Architecture</subject><ispartof>ACM transactions on reconfigurable technology and systems, 2015-11, Vol.9 (1), p.1-16</ispartof><rights>Distributed under a Creative Commons Attribution 4.0 International License</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c221t-40a179bbe0af855be2d9a4fa4ad0ad0e3ce7e5781ba4446f66462fa13861be683</cites><orcidid>0000-0002-9667-9737</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>230,314,780,784,885,27924,27925</link.rule.ids><backlink>$$Uhttps://hal.science/hal-01534260$$DView record in HAL$$Hfree_for_read</backlink></links><search><creatorcontrib>Bai, Yuhui</creatorcontrib><creatorcontrib>Ahmed, Syed Zahid</creatorcontrib><creatorcontrib>Granado, Bertrand</creatorcontrib><title>ARC 2014: Towards a Fast FPGA Implementation of a Heap-Based Priority Queue for Image Coding Using a Parallel Index-Aware Tree</title><title>ACM transactions on reconfigurable technology and systems</title><description>The embedded image processing systems like smartphones and digital cameras have tight limits on storage, computation power, network connectivity, and battery usage. These limitations make it important to ensure efficient image coding. In the article, we present a novel heap-based priority queue structure employed by an Adaptive Scanning of Wavelet Data scheme (ASWD) targeting an embedded platform. ASWD is a context modeling block implemented via priority queues in a wavelet-based image coder to reorganize the wavelet coefficients into locally stationary sequences. The architecture we propose exploits efficient use of FPGA’s on-chip dual-port memories in an adaptive manner. Innovations of index-aware system linked to each element in the queue makes the location of queue element traceable in the heap as per the requirements of the ASWD algorithm. Moreover, use of 4-port memories along with intelligent data concatenation of queue elements yielded in a cost effective enhanced memory access. The memory ports are adaptively assigned to different units during different processing phases in a manner to optimally take advantage of memory access required by that phase. The architectural innovations can also be exploited in other applications that require efficient hardware implementations of generic priority queue or classical sorting applications which sort into the index. We designed and validated the hardware on an Altera’s Stratix IV FPGA as an IP accelerator in a Nios II processor based System on Chip. We show that our architecture at 150MHz can provide 45X speedup compared to an embedded ARM Cortex-A9 processor at 666MHz targeting the throughput of 10MB/s.</description><subject>Computer Science</subject><subject>Electronics</subject><subject>Engineering Sciences</subject><subject>Hardware Architecture</subject><issn>1936-7406</issn><issn>1936-7414</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2015</creationdate><recordtype>article</recordtype><recordid>eNo9j0FLAzEQhYNYaG3Fv-BNPKzOJJNJ9rgsrRUWBNFzmGwTrFQqGxH8925pKTyYx-ObYZ5SNwgPiGQftWMmSxdqhrXhyhHS5dkDT9VVKZ8AbNjTTE2b1_ZWA9JCTbLsSro-zbl6Xy3f2nXVvTw9t01X9VrjT0Ug6OoYE0j21sakN7VQFpINjEqmTy5Z5zEKEXEef2GdBY1njIm9mav7490P2YXvYfslw1_Yyzasmy4cMkBrSDP84sjeHdl-2JcypHxeQAiHsuFU1vwDhJJBoA</recordid><startdate>20151101</startdate><enddate>20151101</enddate><creator>Bai, Yuhui</creator><creator>Ahmed, Syed Zahid</creator><creator>Granado, Bertrand</creator><general>ACM</general><scope>AAYXX</scope><scope>CITATION</scope><scope>1XC</scope><orcidid>https://orcid.org/0000-0002-9667-9737</orcidid></search><sort><creationdate>20151101</creationdate><title>ARC 2014</title><author>Bai, Yuhui ; Ahmed, Syed Zahid ; Granado, Bertrand</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c221t-40a179bbe0af855be2d9a4fa4ad0ad0e3ce7e5781ba4446f66462fa13861be683</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2015</creationdate><topic>Computer Science</topic><topic>Electronics</topic><topic>Engineering Sciences</topic><topic>Hardware Architecture</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Bai, Yuhui</creatorcontrib><creatorcontrib>Ahmed, Syed Zahid</creatorcontrib><creatorcontrib>Granado, Bertrand</creatorcontrib><collection>CrossRef</collection><collection>Hyper Article en Ligne (HAL)</collection><jtitle>ACM transactions on reconfigurable technology and systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Bai, Yuhui</au><au>Ahmed, Syed Zahid</au><au>Granado, Bertrand</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>ARC 2014: Towards a Fast FPGA Implementation of a Heap-Based Priority Queue for Image Coding Using a Parallel Index-Aware Tree</atitle><jtitle>ACM transactions on reconfigurable technology and systems</jtitle><date>2015-11-01</date><risdate>2015</risdate><volume>9</volume><issue>1</issue><spage>1</spage><epage>16</epage><pages>1-16</pages><issn>1936-7406</issn><eissn>1936-7414</eissn><abstract>The embedded image processing systems like smartphones and digital cameras have tight limits on storage, computation power, network connectivity, and battery usage. These limitations make it important to ensure efficient image coding. In the article, we present a novel heap-based priority queue structure employed by an Adaptive Scanning of Wavelet Data scheme (ASWD) targeting an embedded platform. ASWD is a context modeling block implemented via priority queues in a wavelet-based image coder to reorganize the wavelet coefficients into locally stationary sequences. The architecture we propose exploits efficient use of FPGA’s on-chip dual-port memories in an adaptive manner. Innovations of index-aware system linked to each element in the queue makes the location of queue element traceable in the heap as per the requirements of the ASWD algorithm. Moreover, use of 4-port memories along with intelligent data concatenation of queue elements yielded in a cost effective enhanced memory access. The memory ports are adaptively assigned to different units during different processing phases in a manner to optimally take advantage of memory access required by that phase. The architectural innovations can also be exploited in other applications that require efficient hardware implementations of generic priority queue or classical sorting applications which sort into the index. We designed and validated the hardware on an Altera’s Stratix IV FPGA as an IP accelerator in a Nios II processor based System on Chip. We show that our architecture at 150MHz can provide 45X speedup compared to an embedded ARM Cortex-A9 processor at 666MHz targeting the throughput of 10MB/s.</abstract><pub>ACM</pub><doi>10.1145/2766454</doi><tpages>16</tpages><orcidid>https://orcid.org/0000-0002-9667-9737</orcidid></addata></record>
fulltext	fulltext
identifier	ISSN: 1936-7406
ispartof	ACM transactions on reconfigurable technology and systems, 2015-11, Vol.9 (1), p.1-16
issn	1936-7406 1936-7414
language	eng
recordid	cdi_hal_primary_oai_HAL_hal_01534260v1
source	ACM Digital Library Complete
subjects	Computer Science Electronics Engineering Sciences Hardware Architecture
title	ARC 2014: Towards a Fast FPGA Implementation of a Heap-Based Priority Queue for Image Coding Using a Parallel Index-Aware Tree
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T11%3A34%3A06IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-hal_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=ARC%202014:%20Towards%20a%20Fast%20FPGA%20Implementation%20of%20a%20Heap-Based%20Priority%20Queue%20for%20Image%20Coding%20Using%20a%20Parallel%20Index-Aware%20Tree&rft.jtitle=ACM%20transactions%20on%20reconfigurable%20technology%20and%20systems&rft.au=Bai,%20Yuhui&rft.date=2015-11-01&rft.volume=9&rft.issue=1&rft.spage=1&rft.epage=16&rft.pages=1-16&rft.issn=1936-7406&rft.eissn=1936-7414&rft_id=info:doi/10.1145/2766454&rft_dat=%3Chal_cross%3Eoai_HAL_hal_01534260v1%3C/hal_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true