A Fast Discrete Wavelet Transform Using Hybrid Parallelism on GPUs

Wavelet transform has been widely used in many signal and image processing applications. Due to its wide adoption for time-critical applications, such as streaming and real-time signal processing, many acceleration techniques were developed during the past decade. Recently, the graphics processing u...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on parallel and distributed systems 2016-11, Vol.27 (11), p.3088-3100
Hauptverfasser: Quan, Tran Minh, Jeong, Won-Ki
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 3100
container_issue 11
container_start_page 3088
container_title IEEE transactions on parallel and distributed systems
container_volume 27
creator Quan, Tran Minh
Jeong, Won-Ki
description Wavelet transform has been widely used in many signal and image processing applications. Due to its wide adoption for time-critical applications, such as streaming and real-time signal processing, many acceleration techniques were developed during the past decade. Recently, the graphics processing unit (GPU) has gained much attention for accelerating computationally-intensive problems and many solutions of GPU-based discrete wavelet transform (DWT) have been introduced, but most of them did not fully leverage the potential of the GPU. In this paper, we present various state-of-the-art GPU optimization strategies in DWT implementation, such as leveraging shared memory, registers, warp shuffling instructions, and thread- and instruction-level parallelism (TLP, ILP), and finally elaborate our hybrid approach to further boost up its performance. In addition, we introduce a novel mixed-band memory layout for Haar DWT, where multi-level transform can be carried out in a single fused kernel launch. As a result, unlike recent GPU DWT methods that focus mainly on maximizing ILP, we show that the optimal GPU DWT performance can be achieved by hybrid parallelism combining both TLP and ILP together in a mixed-band approach. We demonstrate the performance of our proposed method by comparison with other CPU and GPU DWT methods.
doi_str_mv 10.1109/TPDS.2016.2536028
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_ieee_primary_7422119</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>7422119</ieee_id><sourcerecordid>4224029131</sourcerecordid><originalsourceid>FETCH-LOGICAL-c293t-2e87e9e8f43297044374a82edd498cd04a3ec2964d6f092730683f9a86ab1f5d3</originalsourceid><addsrcrecordid>eNo9kE1Lw0AQhhdRsFZ_gHhZ8Jw6-5Fk91hb2woFA7Z4XLbJrKSkSd1Nhf57t7R4mjk87zvMQ8gjgxFjoF9WxfRzxIFlI56KDLi6IgOWpirhTInruINME82ZviV3IWwBmExBDsjrmM5s6Om0DqXHHumX_cUGe7rytg2u8zu6DnX7TRfHja8rWlhvmwabOuxo19J5sQ735MbZJuDDZQ7Jeva2miyS5cf8fTJeJiXXok84qhw1KicF1zlIKXJpFceqklqVFUgrMJKZrDIHmucCMiWctiqzG-bSSgzJ87l377ufA4bebLuDb-NJE39kwBkTKlLsTJW-C8GjM3tf76w_GgbmpMqcVJmTKnNRFTNP50yNiP98Lnms1OIPM9Fi2Q</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1831021138</pqid></control><display><type>article</type><title>A Fast Discrete Wavelet Transform Using Hybrid Parallelism on GPUs</title><source>IEEE Electronic Library (IEL)</source><creator>Quan, Tran Minh ; Jeong, Won-Ki</creator><creatorcontrib>Quan, Tran Minh ; Jeong, Won-Ki</creatorcontrib><description>Wavelet transform has been widely used in many signal and image processing applications. Due to its wide adoption for time-critical applications, such as streaming and real-time signal processing, many acceleration techniques were developed during the past decade. Recently, the graphics processing unit (GPU) has gained much attention for accelerating computationally-intensive problems and many solutions of GPU-based discrete wavelet transform (DWT) have been introduced, but most of them did not fully leverage the potential of the GPU. In this paper, we present various state-of-the-art GPU optimization strategies in DWT implementation, such as leveraging shared memory, registers, warp shuffling instructions, and thread- and instruction-level parallelism (TLP, ILP), and finally elaborate our hybrid approach to further boost up its performance. In addition, we introduce a novel mixed-band memory layout for Haar DWT, where multi-level transform can be carried out in a single fused kernel launch. As a result, unlike recent GPU DWT methods that focus mainly on maximizing ILP, we show that the optimal GPU DWT performance can be achieved by hybrid parallelism combining both TLP and ILP together in a mixed-band approach. We demonstrate the performance of our proposed method by comparison with other CPU and GPU DWT methods.</description><identifier>ISSN: 1045-9219</identifier><identifier>EISSN: 1558-2183</identifier><identifier>DOI: 10.1109/TPDS.2016.2536028</identifier><identifier>CODEN: ITDSEO</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Acceleration ; bit rotation ; Discrete wavelet transforms ; GPU computing ; Graphics processing units ; hybrid parallelism ; lifting scheme ; Parallel processing ; Registers ; Wavelet transform ; Wavelet transforms</subject><ispartof>IEEE transactions on parallel and distributed systems, 2016-11, Vol.27 (11), p.3088-3100</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2016</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c293t-2e87e9e8f43297044374a82edd498cd04a3ec2964d6f092730683f9a86ab1f5d3</citedby><cites>FETCH-LOGICAL-c293t-2e87e9e8f43297044374a82edd498cd04a3ec2964d6f092730683f9a86ab1f5d3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/7422119$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/7422119$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Quan, Tran Minh</creatorcontrib><creatorcontrib>Jeong, Won-Ki</creatorcontrib><title>A Fast Discrete Wavelet Transform Using Hybrid Parallelism on GPUs</title><title>IEEE transactions on parallel and distributed systems</title><addtitle>TPDS</addtitle><description>Wavelet transform has been widely used in many signal and image processing applications. Due to its wide adoption for time-critical applications, such as streaming and real-time signal processing, many acceleration techniques were developed during the past decade. Recently, the graphics processing unit (GPU) has gained much attention for accelerating computationally-intensive problems and many solutions of GPU-based discrete wavelet transform (DWT) have been introduced, but most of them did not fully leverage the potential of the GPU. In this paper, we present various state-of-the-art GPU optimization strategies in DWT implementation, such as leveraging shared memory, registers, warp shuffling instructions, and thread- and instruction-level parallelism (TLP, ILP), and finally elaborate our hybrid approach to further boost up its performance. In addition, we introduce a novel mixed-band memory layout for Haar DWT, where multi-level transform can be carried out in a single fused kernel launch. As a result, unlike recent GPU DWT methods that focus mainly on maximizing ILP, we show that the optimal GPU DWT performance can be achieved by hybrid parallelism combining both TLP and ILP together in a mixed-band approach. We demonstrate the performance of our proposed method by comparison with other CPU and GPU DWT methods.</description><subject>Acceleration</subject><subject>bit rotation</subject><subject>Discrete wavelet transforms</subject><subject>GPU computing</subject><subject>Graphics processing units</subject><subject>hybrid parallelism</subject><subject>lifting scheme</subject><subject>Parallel processing</subject><subject>Registers</subject><subject>Wavelet transform</subject><subject>Wavelet transforms</subject><issn>1045-9219</issn><issn>1558-2183</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2016</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kE1Lw0AQhhdRsFZ_gHhZ8Jw6-5Fk91hb2woFA7Z4XLbJrKSkSd1Nhf57t7R4mjk87zvMQ8gjgxFjoF9WxfRzxIFlI56KDLi6IgOWpirhTInruINME82ZviV3IWwBmExBDsjrmM5s6Om0DqXHHumX_cUGe7rytg2u8zu6DnX7TRfHja8rWlhvmwabOuxo19J5sQ735MbZJuDDZQ7Jeva2miyS5cf8fTJeJiXXok84qhw1KicF1zlIKXJpFceqklqVFUgrMJKZrDIHmucCMiWctiqzG-bSSgzJ87l377ufA4bebLuDb-NJE39kwBkTKlLsTJW-C8GjM3tf76w_GgbmpMqcVJmTKnNRFTNP50yNiP98Lnms1OIPM9Fi2Q</recordid><startdate>20161101</startdate><enddate>20161101</enddate><creator>Quan, Tran Minh</creator><creator>Jeong, Won-Ki</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20161101</creationdate><title>A Fast Discrete Wavelet Transform Using Hybrid Parallelism on GPUs</title><author>Quan, Tran Minh ; Jeong, Won-Ki</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c293t-2e87e9e8f43297044374a82edd498cd04a3ec2964d6f092730683f9a86ab1f5d3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2016</creationdate><topic>Acceleration</topic><topic>bit rotation</topic><topic>Discrete wavelet transforms</topic><topic>GPU computing</topic><topic>Graphics processing units</topic><topic>hybrid parallelism</topic><topic>lifting scheme</topic><topic>Parallel processing</topic><topic>Registers</topic><topic>Wavelet transform</topic><topic>Wavelet transforms</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Quan, Tran Minh</creatorcontrib><creatorcontrib>Jeong, Won-Ki</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on parallel and distributed systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Quan, Tran Minh</au><au>Jeong, Won-Ki</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A Fast Discrete Wavelet Transform Using Hybrid Parallelism on GPUs</atitle><jtitle>IEEE transactions on parallel and distributed systems</jtitle><stitle>TPDS</stitle><date>2016-11-01</date><risdate>2016</risdate><volume>27</volume><issue>11</issue><spage>3088</spage><epage>3100</epage><pages>3088-3100</pages><issn>1045-9219</issn><eissn>1558-2183</eissn><coden>ITDSEO</coden><abstract>Wavelet transform has been widely used in many signal and image processing applications. Due to its wide adoption for time-critical applications, such as streaming and real-time signal processing, many acceleration techniques were developed during the past decade. Recently, the graphics processing unit (GPU) has gained much attention for accelerating computationally-intensive problems and many solutions of GPU-based discrete wavelet transform (DWT) have been introduced, but most of them did not fully leverage the potential of the GPU. In this paper, we present various state-of-the-art GPU optimization strategies in DWT implementation, such as leveraging shared memory, registers, warp shuffling instructions, and thread- and instruction-level parallelism (TLP, ILP), and finally elaborate our hybrid approach to further boost up its performance. In addition, we introduce a novel mixed-band memory layout for Haar DWT, where multi-level transform can be carried out in a single fused kernel launch. As a result, unlike recent GPU DWT methods that focus mainly on maximizing ILP, we show that the optimal GPU DWT performance can be achieved by hybrid parallelism combining both TLP and ILP together in a mixed-band approach. We demonstrate the performance of our proposed method by comparison with other CPU and GPU DWT methods.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TPDS.2016.2536028</doi><tpages>13</tpages></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1045-9219
ispartof IEEE transactions on parallel and distributed systems, 2016-11, Vol.27 (11), p.3088-3100
issn 1045-9219
1558-2183
language eng
recordid cdi_ieee_primary_7422119
source IEEE Electronic Library (IEL)
subjects Acceleration
bit rotation
Discrete wavelet transforms
GPU computing
Graphics processing units
hybrid parallelism
lifting scheme
Parallel processing
Registers
Wavelet transform
Wavelet transforms
title A Fast Discrete Wavelet Transform Using Hybrid Parallelism on GPUs
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-02T21%3A52%3A07IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20Fast%20Discrete%20Wavelet%20Transform%20Using%20Hybrid%20Parallelism%20on%20GPUs&rft.jtitle=IEEE%20transactions%20on%20parallel%20and%20distributed%20systems&rft.au=Quan,%20Tran%20Minh&rft.date=2016-11-01&rft.volume=27&rft.issue=11&rft.spage=3088&rft.epage=3100&rft.pages=3088-3100&rft.issn=1045-9219&rft.eissn=1558-2183&rft.coden=ITDSEO&rft_id=info:doi/10.1109/TPDS.2016.2536028&rft_dat=%3Cproquest_RIE%3E4224029131%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1831021138&rft_id=info:pmid/&rft_ieee_id=7422119&rfr_iscdi=true