A Fast Discrete Wavelet Transform Using Hybrid Parallelism on GPUs

Wavelet transform has been widely used in many signal and image processing applications. Due to its wide adoption for time-critical applications, such as streaming and real-time signal processing, many acceleration techniques were developed during the past decade. Recently, the graphics processing u...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on parallel and distributed systems 2016-11, Vol.27 (11), p.3088-3100
Hauptverfasser:	Quan, Tran Minh, Jeong, Won-Ki
Format:	Artikel
Sprache:	eng
Schlagworte:	Acceleration bit rotation Discrete wavelet transforms GPU computing Graphics processing units hybrid parallelism lifting scheme Parallel processing Registers Wavelet transform Wavelet transforms
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	3100
container_issue	11
container_start_page	3088
container_title	IEEE transactions on parallel and distributed systems
container_volume	27
creator	Quan, Tran Minh Jeong, Won-Ki
description	Wavelet transform has been widely used in many signal and image processing applications. Due to its wide adoption for time-critical applications, such as streaming and real-time signal processing, many acceleration techniques were developed during the past decade. Recently, the graphics processing unit (GPU) has gained much attention for accelerating computationally-intensive problems and many solutions of GPU-based discrete wavelet transform (DWT) have been introduced, but most of them did not fully leverage the potential of the GPU. In this paper, we present various state-of-the-art GPU optimization strategies in DWT implementation, such as leveraging shared memory, registers, warp shuffling instructions, and thread- and instruction-level parallelism (TLP, ILP), and finally elaborate our hybrid approach to further boost up its performance. In addition, we introduce a novel mixed-band memory layout for Haar DWT, where multi-level transform can be carried out in a single fused kernel launch. As a result, unlike recent GPU DWT methods that focus mainly on maximizing ILP, we show that the optimal GPU DWT performance can be achieved by hybrid parallelism combining both TLP and ILP together in a mixed-band approach. We demonstrate the performance of our proposed method by comparison with other CPU and GPU DWT methods.
doi_str_mv	10.1109/TPDS.2016.2536028
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_ieee_primary_7422119</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>7422119</ieee_id><sourcerecordid>4224029131</sourcerecordid><originalsourceid>FETCH-LOGICAL-c293t-2e87e9e8f43297044374a82edd498cd04a3ec2964d6f092730683f9a86ab1f5d3</originalsourceid><addsrcrecordid>eNo9kE1Lw0AQhhdRsFZ_gHhZ8Jw6-5Fk91hb2woFA7Z4XLbJrKSkSd1Nhf57t7R4mjk87zvMQ8gjgxFjoF9WxfRzxIFlI56KDLi6IgOWpirhTInruINME82ZviV3IWwBmExBDsjrmM5s6Om0DqXHHumX_cUGe7rytg2u8zu6DnX7TRfHja8rWlhvmwabOuxo19J5sQ735MbZJuDDZQ7Jeva2miyS5cf8fTJeJiXXok84qhw1KicF1zlIKXJpFceqklqVFUgrMJKZrDIHmucCMiWctiqzG-bSSgzJ87l377ufA4bebLuDb-NJE39kwBkTKlLsTJW-C8GjM3tf76w_GgbmpMqcVJmTKnNRFTNP50yNiP98Lnms1OIPM9Fi2Q</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1831021138</pqid></control><display><type>article</type><title>A Fast Discrete Wavelet Transform Using Hybrid Parallelism on GPUs</title><source>IEEE Electronic Library (IEL)</source><creator>Quan, Tran Minh ; Jeong, Won-Ki</creator><creatorcontrib>Quan, Tran Minh ; Jeong, Won-Ki</creatorcontrib><description>Wavelet transform has been widely used in many signal and image processing applications. Due to its wide adoption for time-critical applications, such as streaming and real-time signal processing, many acceleration techniques were developed during the past decade. Recently, the graphics processing unit (GPU) has gained much attention for accelerating computationally-intensive problems and many solutions of GPU-based discrete wavelet transform (DWT) have been introduced, but most of them did not fully leverage the potential of the GPU. In this paper, we present various state-of-the-art GPU optimization strategies in DWT implementation, such as leveraging shared memory, registers, warp shuffling instructions, and thread- and instruction-level parallelism (TLP, ILP), and finally elaborate our hybrid approach to further boost up its performance. In addition, we introduce a novel mixed-band memory layout for Haar DWT, where multi-level transform can be carried out in a single fused kernel launch. As a result, unlike recent GPU DWT methods that focus mainly on maximizing ILP, we show that the optimal GPU DWT performance can be achieved by hybrid parallelism combining both TLP and ILP together in a mixed-band approach. We demonstrate the performance of our proposed method by comparison with other CPU and GPU DWT methods.</description><identifier>ISSN: 1045-9219</identifier><identifier>EISSN: 1558-2183</identifier><identifier>DOI: 10.1109/TPDS.2016.2536028</identifier><identifier>CODEN: ITDSEO</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Acceleration ; bit rotation ; Discrete wavelet transforms ; GPU computing ; Graphics processing units ; hybrid parallelism ; lifting scheme ; Parallel processing ; Registers ; Wavelet transform ; Wavelet transforms</subject><ispartof>IEEE transactions on parallel and distributed systems, 2016-11, Vol.27 (11), p.3088-3100</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2016</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c293t-2e87e9e8f43297044374a82edd498cd04a3ec2964d6f092730683f9a86ab1f5d3</citedby><cites>FETCH-LOGICAL-c293t-2e87e9e8f43297044374a82edd498cd04a3ec2964d6f092730683f9a86ab1f5d3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/7422119$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/7422119$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Quan, Tran Minh</creatorcontrib><creatorcontrib>Jeong, Won-Ki</creatorcontrib><title>A Fast Discrete Wavelet Transform Using Hybrid Parallelism on GPUs</title><title>IEEE transactions on parallel and distributed systems</title><addtitle>TPDS</addtitle><description>Wavelet transform has been widely used in many signal and image processing applications. Due to its wide adoption for time-critical applications, such as streaming and real-time signal processing, many acceleration techniques were developed during the past decade. Recently, the graphics processing unit (GPU) has gained much attention for accelerating computationally-intensive problems and many solutions of GPU-based discrete wavelet transform (DWT) have been introduced, but most of them did not fully leverage the potential of the GPU. In this paper, we present various state-of-the-art GPU optimization strategies in DWT implementation, such as leveraging shared memory, registers, warp shuffling instructions, and thread- and instruction-level parallelism (TLP, ILP), and finally elaborate our hybrid approach to further boost up its performance. In addition, we introduce a novel mixed-band memory layout for Haar DWT, where multi-level transform can be carried out in a single fused kernel launch. As a result, unlike recent GPU DWT methods that focus mainly on maximizing ILP, we show that the optimal GPU DWT performance can be achieved by hybrid parallelism combining both TLP and ILP together in a mixed-band approach. We demonstrate the performance of our proposed method by comparison with other CPU and GPU DWT methods.</description><subject>Acceleration</subject><subject>bit rotation</subject><subject>Discrete wavelet transforms</subject><subject>GPU computing</subject><subject>Graphics processing units</subject><subject>hybrid parallelism</subject><subject>lifting scheme</subject><subject>Parallel processing</subject><subject>Registers</subject><subject>Wavelet transform</subject><subject>Wavelet transforms</subject><issn>1045-9219</issn><issn>1558-2183</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2016</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kE1Lw0AQhhdRsFZ_gHhZ8Jw6-5Fk91hb2woFA7Z4XLbJrKSkSd1Nhf57t7R4mjk87zvMQ8gjgxFjoF9WxfRzxIFlI56KDLi6IgOWpirhTInruINME82ZviV3IWwBmExBDsjrmM5s6Om0DqXHHumX_cUGe7rytg2u8zu6DnX7TRfHja8rWlhvmwabOuxo19J5sQ735MbZJuDDZQ7Jeva2miyS5cf8fTJeJiXXok84qhw1KicF1zlIKXJpFceqklqVFUgrMJKZrDIHmucCMiWctiqzG-bSSgzJ87l377ufA4bebLuDb-NJE39kwBkTKlLsTJW-C8GjM3tf76w_GgbmpMqcVJmTKnNRFTNP50yNiP98Lnms1OIPM9Fi2Q</recordid><startdate>20161101</startdate><enddate>20161101</enddate><creator>Quan, Tran Minh</creator><creator>Jeong, Won-Ki</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20161101</creationdate><title>A Fast Discrete Wavelet Transform Using Hybrid Parallelism on GPUs</title><author>Quan, Tran Minh ; Jeong, Won-Ki</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c293t-2e87e9e8f43297044374a82edd498cd04a3ec2964d6f092730683f9a86ab1f5d3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2016</creationdate><topic>Acceleration</topic><topic>bit rotation</topic><topic>Discrete wavelet transforms</topic><topic>GPU computing</topic><topic>Graphics processing units</topic><topic>hybrid parallelism</topic><topic>lifting scheme</topic><topic>Parallel processing</topic><topic>Registers</topic><topic>Wavelet transform</topic><topic>Wavelet transforms</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Quan, Tran Minh</creatorcontrib><creatorcontrib>Jeong, Won-Ki</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on parallel and distributed systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Quan, Tran Minh</au><au>Jeong, Won-Ki</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A Fast Discrete Wavelet Transform Using Hybrid Parallelism on GPUs</atitle><jtitle>IEEE transactions on parallel and distributed systems</jtitle><stitle>TPDS</stitle><date>2016-11-01</date><risdate>2016</risdate><volume>27</volume><issue>11</issue><spage>3088</spage><epage>3100</epage><pages>3088-3100</pages><issn>1045-9219</issn><eissn>1558-2183</eissn><coden>ITDSEO</coden><abstract>Wavelet transform has been widely used in many signal and image processing applications. Due to its wide adoption for time-critical applications, such as streaming and real-time signal processing, many acceleration techniques were developed during the past decade. Recently, the graphics processing unit (GPU) has gained much attention for accelerating computationally-intensive problems and many solutions of GPU-based discrete wavelet transform (DWT) have been introduced, but most of them did not fully leverage the potential of the GPU. In this paper, we present various state-of-the-art GPU optimization strategies in DWT implementation, such as leveraging shared memory, registers, warp shuffling instructions, and thread- and instruction-level parallelism (TLP, ILP), and finally elaborate our hybrid approach to further boost up its performance. In addition, we introduce a novel mixed-band memory layout for Haar DWT, where multi-level transform can be carried out in a single fused kernel launch. As a result, unlike recent GPU DWT methods that focus mainly on maximizing ILP, we show that the optimal GPU DWT performance can be achieved by hybrid parallelism combining both TLP and ILP together in a mixed-band approach. We demonstrate the performance of our proposed method by comparison with other CPU and GPU DWT methods.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TPDS.2016.2536028</doi><tpages>13</tpages></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 1045-9219
ispartof	IEEE transactions on parallel and distributed systems, 2016-11, Vol.27 (11), p.3088-3100
issn	1045-9219 1558-2183
language	eng
recordid	cdi_ieee_primary_7422119
source	IEEE Electronic Library (IEL)
subjects	Acceleration bit rotation Discrete wavelet transforms GPU computing Graphics processing units hybrid parallelism lifting scheme Parallel processing Registers Wavelet transform Wavelet transforms
title	A Fast Discrete Wavelet Transform Using Hybrid Parallelism on GPUs
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-02T21%3A52%3A07IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20Fast%20Discrete%20Wavelet%20Transform%20Using%20Hybrid%20Parallelism%20on%20GPUs&rft.jtitle=IEEE%20transactions%20on%20parallel%20and%20distributed%20systems&rft.au=Quan,%20Tran%20Minh&rft.date=2016-11-01&rft.volume=27&rft.issue=11&rft.spage=3088&rft.epage=3100&rft.pages=3088-3100&rft.issn=1045-9219&rft.eissn=1558-2183&rft.coden=ITDSEO&rft_id=info:doi/10.1109/TPDS.2016.2536028&rft_dat=%3Cproquest_RIE%3E4224029131%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1831021138&rft_id=info:pmid/&rft_ieee_id=7422119&rfr_iscdi=true