A Fast Discrete Wavelet Transform Using Hybrid Parallelism on GPUs
Wavelet transform has been widely used in many signal and image processing applications. Due to its wide adoption for time-critical applications, such as streaming and real-time signal processing, many acceleration techniques were developed during the past decade. Recently, the graphics processing u...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on parallel and distributed systems 2016-11, Vol.27 (11), p.3088-3100 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 3100 |
---|---|
container_issue | 11 |
container_start_page | 3088 |
container_title | IEEE transactions on parallel and distributed systems |
container_volume | 27 |
creator | Quan, Tran Minh Jeong, Won-Ki |
description | Wavelet transform has been widely used in many signal and image processing applications. Due to its wide adoption for time-critical applications, such as streaming and real-time signal processing, many acceleration techniques were developed during the past decade. Recently, the graphics processing unit (GPU) has gained much attention for accelerating computationally-intensive problems and many solutions of GPU-based discrete wavelet transform (DWT) have been introduced, but most of them did not fully leverage the potential of the GPU. In this paper, we present various state-of-the-art GPU optimization strategies in DWT implementation, such as leveraging shared memory, registers, warp shuffling instructions, and thread- and instruction-level parallelism (TLP, ILP), and finally elaborate our hybrid approach to further boost up its performance. In addition, we introduce a novel mixed-band memory layout for Haar DWT, where multi-level transform can be carried out in a single fused kernel launch. As a result, unlike recent GPU DWT methods that focus mainly on maximizing ILP, we show that the optimal GPU DWT performance can be achieved by hybrid parallelism combining both TLP and ILP together in a mixed-band approach. We demonstrate the performance of our proposed method by comparison with other CPU and GPU DWT methods. |
doi_str_mv | 10.1109/TPDS.2016.2536028 |
format | Article |
fullrecord | <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_ieee_primary_7422119</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>7422119</ieee_id><sourcerecordid>4224029131</sourcerecordid><originalsourceid>FETCH-LOGICAL-c293t-2e87e9e8f43297044374a82edd498cd04a3ec2964d6f092730683f9a86ab1f5d3</originalsourceid><addsrcrecordid>eNo9kE1Lw0AQhhdRsFZ_gHhZ8Jw6-5Fk91hb2woFA7Z4XLbJrKSkSd1Nhf57t7R4mjk87zvMQ8gjgxFjoF9WxfRzxIFlI56KDLi6IgOWpirhTInruINME82ZviV3IWwBmExBDsjrmM5s6Om0DqXHHumX_cUGe7rytg2u8zu6DnX7TRfHja8rWlhvmwabOuxo19J5sQ735MbZJuDDZQ7Jeva2miyS5cf8fTJeJiXXok84qhw1KicF1zlIKXJpFceqklqVFUgrMJKZrDIHmucCMiWctiqzG-bSSgzJ87l377ufA4bebLuDb-NJE39kwBkTKlLsTJW-C8GjM3tf76w_GgbmpMqcVJmTKnNRFTNP50yNiP98Lnms1OIPM9Fi2Q</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1831021138</pqid></control><display><type>article</type><title>A Fast Discrete Wavelet Transform Using Hybrid Parallelism on GPUs</title><source>IEEE Electronic Library (IEL)</source><creator>Quan, Tran Minh ; Jeong, Won-Ki</creator><creatorcontrib>Quan, Tran Minh ; Jeong, Won-Ki</creatorcontrib><description>Wavelet transform has been widely used in many signal and image processing applications. Due to its wide adoption for time-critical applications, such as streaming and real-time signal processing, many acceleration techniques were developed during the past decade. Recently, the graphics processing unit (GPU) has gained much attention for accelerating computationally-intensive problems and many solutions of GPU-based discrete wavelet transform (DWT) have been introduced, but most of them did not fully leverage the potential of the GPU. In this paper, we present various state-of-the-art GPU optimization strategies in DWT implementation, such as leveraging shared memory, registers, warp shuffling instructions, and thread- and instruction-level parallelism (TLP, ILP), and finally elaborate our hybrid approach to further boost up its performance. In addition, we introduce a novel mixed-band memory layout for Haar DWT, where multi-level transform can be carried out in a single fused kernel launch. As a result, unlike recent GPU DWT methods that focus mainly on maximizing ILP, we show that the optimal GPU DWT performance can be achieved by hybrid parallelism combining both TLP and ILP together in a mixed-band approach. We demonstrate the performance of our proposed method by comparison with other CPU and GPU DWT methods.</description><identifier>ISSN: 1045-9219</identifier><identifier>EISSN: 1558-2183</identifier><identifier>DOI: 10.1109/TPDS.2016.2536028</identifier><identifier>CODEN: ITDSEO</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Acceleration ; bit rotation ; Discrete wavelet transforms ; GPU computing ; Graphics processing units ; hybrid parallelism ; lifting scheme ; Parallel processing ; Registers ; Wavelet transform ; Wavelet transforms</subject><ispartof>IEEE transactions on parallel and distributed systems, 2016-11, Vol.27 (11), p.3088-3100</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2016</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c293t-2e87e9e8f43297044374a82edd498cd04a3ec2964d6f092730683f9a86ab1f5d3</citedby><cites>FETCH-LOGICAL-c293t-2e87e9e8f43297044374a82edd498cd04a3ec2964d6f092730683f9a86ab1f5d3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/7422119$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/7422119$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Quan, Tran Minh</creatorcontrib><creatorcontrib>Jeong, Won-Ki</creatorcontrib><title>A Fast Discrete Wavelet Transform Using Hybrid Parallelism on GPUs</title><title>IEEE transactions on parallel and distributed systems</title><addtitle>TPDS</addtitle><description>Wavelet transform has been widely used in many signal and image processing applications. Due to its wide adoption for time-critical applications, such as streaming and real-time signal processing, many acceleration techniques were developed during the past decade. Recently, the graphics processing unit (GPU) has gained much attention for accelerating computationally-intensive problems and many solutions of GPU-based discrete wavelet transform (DWT) have been introduced, but most of them did not fully leverage the potential of the GPU. In this paper, we present various state-of-the-art GPU optimization strategies in DWT implementation, such as leveraging shared memory, registers, warp shuffling instructions, and thread- and instruction-level parallelism (TLP, ILP), and finally elaborate our hybrid approach to further boost up its performance. In addition, we introduce a novel mixed-band memory layout for Haar DWT, where multi-level transform can be carried out in a single fused kernel launch. As a result, unlike recent GPU DWT methods that focus mainly on maximizing ILP, we show that the optimal GPU DWT performance can be achieved by hybrid parallelism combining both TLP and ILP together in a mixed-band approach. We demonstrate the performance of our proposed method by comparison with other CPU and GPU DWT methods.</description><subject>Acceleration</subject><subject>bit rotation</subject><subject>Discrete wavelet transforms</subject><subject>GPU computing</subject><subject>Graphics processing units</subject><subject>hybrid parallelism</subject><subject>lifting scheme</subject><subject>Parallel processing</subject><subject>Registers</subject><subject>Wavelet transform</subject><subject>Wavelet transforms</subject><issn>1045-9219</issn><issn>1558-2183</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2016</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kE1Lw0AQhhdRsFZ_gHhZ8Jw6-5Fk91hb2woFA7Z4XLbJrKSkSd1Nhf57t7R4mjk87zvMQ8gjgxFjoF9WxfRzxIFlI56KDLi6IgOWpirhTInruINME82ZviV3IWwBmExBDsjrmM5s6Om0DqXHHumX_cUGe7rytg2u8zu6DnX7TRfHja8rWlhvmwabOuxo19J5sQ735MbZJuDDZQ7Jeva2miyS5cf8fTJeJiXXok84qhw1KicF1zlIKXJpFceqklqVFUgrMJKZrDIHmucCMiWctiqzG-bSSgzJ87l377ufA4bebLuDb-NJE39kwBkTKlLsTJW-C8GjM3tf76w_GgbmpMqcVJmTKnNRFTNP50yNiP98Lnms1OIPM9Fi2Q</recordid><startdate>20161101</startdate><enddate>20161101</enddate><creator>Quan, Tran Minh</creator><creator>Jeong, Won-Ki</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20161101</creationdate><title>A Fast Discrete Wavelet Transform Using Hybrid Parallelism on GPUs</title><author>Quan, Tran Minh ; Jeong, Won-Ki</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c293t-2e87e9e8f43297044374a82edd498cd04a3ec2964d6f092730683f9a86ab1f5d3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2016</creationdate><topic>Acceleration</topic><topic>bit rotation</topic><topic>Discrete wavelet transforms</topic><topic>GPU computing</topic><topic>Graphics processing units</topic><topic>hybrid parallelism</topic><topic>lifting scheme</topic><topic>Parallel processing</topic><topic>Registers</topic><topic>Wavelet transform</topic><topic>Wavelet transforms</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Quan, Tran Minh</creatorcontrib><creatorcontrib>Jeong, Won-Ki</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on parallel and distributed systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Quan, Tran Minh</au><au>Jeong, Won-Ki</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A Fast Discrete Wavelet Transform Using Hybrid Parallelism on GPUs</atitle><jtitle>IEEE transactions on parallel and distributed systems</jtitle><stitle>TPDS</stitle><date>2016-11-01</date><risdate>2016</risdate><volume>27</volume><issue>11</issue><spage>3088</spage><epage>3100</epage><pages>3088-3100</pages><issn>1045-9219</issn><eissn>1558-2183</eissn><coden>ITDSEO</coden><abstract>Wavelet transform has been widely used in many signal and image processing applications. Due to its wide adoption for time-critical applications, such as streaming and real-time signal processing, many acceleration techniques were developed during the past decade. Recently, the graphics processing unit (GPU) has gained much attention for accelerating computationally-intensive problems and many solutions of GPU-based discrete wavelet transform (DWT) have been introduced, but most of them did not fully leverage the potential of the GPU. In this paper, we present various state-of-the-art GPU optimization strategies in DWT implementation, such as leveraging shared memory, registers, warp shuffling instructions, and thread- and instruction-level parallelism (TLP, ILP), and finally elaborate our hybrid approach to further boost up its performance. In addition, we introduce a novel mixed-band memory layout for Haar DWT, where multi-level transform can be carried out in a single fused kernel launch. As a result, unlike recent GPU DWT methods that focus mainly on maximizing ILP, we show that the optimal GPU DWT performance can be achieved by hybrid parallelism combining both TLP and ILP together in a mixed-band approach. We demonstrate the performance of our proposed method by comparison with other CPU and GPU DWT methods.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TPDS.2016.2536028</doi><tpages>13</tpages></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 1045-9219 |
ispartof | IEEE transactions on parallel and distributed systems, 2016-11, Vol.27 (11), p.3088-3100 |
issn | 1045-9219 1558-2183 |
language | eng |
recordid | cdi_ieee_primary_7422119 |
source | IEEE Electronic Library (IEL) |
subjects | Acceleration bit rotation Discrete wavelet transforms GPU computing Graphics processing units hybrid parallelism lifting scheme Parallel processing Registers Wavelet transform Wavelet transforms |
title | A Fast Discrete Wavelet Transform Using Hybrid Parallelism on GPUs |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-02T21%3A52%3A07IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20Fast%20Discrete%20Wavelet%20Transform%20Using%20Hybrid%20Parallelism%20on%20GPUs&rft.jtitle=IEEE%20transactions%20on%20parallel%20and%20distributed%20systems&rft.au=Quan,%20Tran%20Minh&rft.date=2016-11-01&rft.volume=27&rft.issue=11&rft.spage=3088&rft.epage=3100&rft.pages=3088-3100&rft.issn=1045-9219&rft.eissn=1558-2183&rft.coden=ITDSEO&rft_id=info:doi/10.1109/TPDS.2016.2536028&rft_dat=%3Cproquest_RIE%3E4224029131%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1831021138&rft_id=info:pmid/&rft_ieee_id=7422119&rfr_iscdi=true |