A Parallel Implementation of the 2D Wavelet Transform Using CUDA
There is a multicore platform that is currently concentrating an enormous attention due to its tremendous potential in terms of sustained performance: the NVIDIA Tesla boards. These cards intended for general-purpose computing on graphic processing units (GPGPUs) are used as data-parallel computing...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Tagungsbericht |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 118 |
---|---|
container_issue | |
container_start_page | 111 |
container_title | |
container_volume | |
creator | Franco, J. Bernabe, G. Fernandez, J. Acacio, M.E. |
description | There is a multicore platform that is currently concentrating an enormous attention due to its tremendous potential in terms of sustained performance: the NVIDIA Tesla boards. These cards intended for general-purpose computing on graphic processing units (GPGPUs) are used as data-parallel computing devices. They are based on the Computed Unified Device Architecture (CUDA) which is common to the latest NVIDIA GPUs. The bottom line is a multicore platform which provides an enormous potential performance benefit driven by a non-traditional programming model. In this paper we try to provide some insight into the peculiarities of CUDA in order to target scientific computing by means of a specific example. In particular, we show that the parallelization of the two-dimensional fast wavelet transform for the NVIDIA Tesla C870 achieves a speedup of 20.8 for an image size of 8192times8192, when compared with the fastest host-only version implementation using OpenMP and including the data transfers between main memory and device memory. |
doi_str_mv | 10.1109/PDP.2009.40 |
format | Conference Proceeding |
fullrecord | <record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_4912922</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>4912922</ieee_id><sourcerecordid>4912922</sourcerecordid><originalsourceid>FETCH-LOGICAL-i90t-98f0507d018d8e5baacb575cdf63346fb341e3fdf608f9d9af2795aba432b553</originalsourceid><addsrcrecordid>eNotjstKw0AUQAcfYK1duXQzP5B45527MyQ-CgUDtrgsN82MRiZJSYLg31vQ1eFsDoexWwGpEID3VVmlEgBTDWdsIZVziXEGztkKXQbOolFGa7xgCwHWJlagvGLX0_QFAE5LXLCHnFc0Uow-8nV3jL7z_UxzO_R8CHz-9FyW_J2-ffQz347UT2EYO76b2v6DF7syv2GXgeLkV_9csrenx23xkmxen9dFvklahDnBLIAB14DImsybmuhQn04PTbBKaRtqpYVX4aSQBWyQgnRoqCatZG2MWrK7v2rrvd8fx7aj8WevUUiUUv0COPZIpA</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>A Parallel Implementation of the 2D Wavelet Transform Using CUDA</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Franco, J. ; Bernabe, G. ; Fernandez, J. ; Acacio, M.E.</creator><creatorcontrib>Franco, J. ; Bernabe, G. ; Fernandez, J. ; Acacio, M.E.</creatorcontrib><description>There is a multicore platform that is currently concentrating an enormous attention due to its tremendous potential in terms of sustained performance: the NVIDIA Tesla boards. These cards intended for general-purpose computing on graphic processing units (GPGPUs) are used as data-parallel computing devices. They are based on the Computed Unified Device Architecture (CUDA) which is common to the latest NVIDIA GPUs. The bottom line is a multicore platform which provides an enormous potential performance benefit driven by a non-traditional programming model. In this paper we try to provide some insight into the peculiarities of CUDA in order to target scientific computing by means of a specific example. In particular, we show that the parallelization of the two-dimensional fast wavelet transform for the NVIDIA Tesla C870 achieves a speedup of 20.8 for an image size of 8192times8192, when compared with the fastest host-only version implementation using OpenMP and including the data transfers between main memory and device memory.</description><identifier>ISSN: 1066-6192</identifier><identifier>ISBN: 9780769535449</identifier><identifier>ISBN: 0769535445</identifier><identifier>EISSN: 2377-5750</identifier><identifier>DOI: 10.1109/PDP.2009.40</identifier><language>eng</language><publisher>IEEE</publisher><subject>2D fast wavelet transform ; Central Processing Unit ; Computer architecture ; CUDA ; Discrete cosine transforms ; Graphics ; Image coding ; Multicore processing ; multicore processor ; NVIDIA Tesla ; parallel programming ; Scientific computing ; Video compression ; Wavelet transforms ; Yarn</subject><ispartof>2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing, 2009, p.111-118</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/4912922$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,2058,27925,54920</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/4912922$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Franco, J.</creatorcontrib><creatorcontrib>Bernabe, G.</creatorcontrib><creatorcontrib>Fernandez, J.</creatorcontrib><creatorcontrib>Acacio, M.E.</creatorcontrib><title>A Parallel Implementation of the 2D Wavelet Transform Using CUDA</title><title>2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing</title><addtitle>PDP</addtitle><description>There is a multicore platform that is currently concentrating an enormous attention due to its tremendous potential in terms of sustained performance: the NVIDIA Tesla boards. These cards intended for general-purpose computing on graphic processing units (GPGPUs) are used as data-parallel computing devices. They are based on the Computed Unified Device Architecture (CUDA) which is common to the latest NVIDIA GPUs. The bottom line is a multicore platform which provides an enormous potential performance benefit driven by a non-traditional programming model. In this paper we try to provide some insight into the peculiarities of CUDA in order to target scientific computing by means of a specific example. In particular, we show that the parallelization of the two-dimensional fast wavelet transform for the NVIDIA Tesla C870 achieves a speedup of 20.8 for an image size of 8192times8192, when compared with the fastest host-only version implementation using OpenMP and including the data transfers between main memory and device memory.</description><subject>2D fast wavelet transform</subject><subject>Central Processing Unit</subject><subject>Computer architecture</subject><subject>CUDA</subject><subject>Discrete cosine transforms</subject><subject>Graphics</subject><subject>Image coding</subject><subject>Multicore processing</subject><subject>multicore processor</subject><subject>NVIDIA Tesla</subject><subject>parallel programming</subject><subject>Scientific computing</subject><subject>Video compression</subject><subject>Wavelet transforms</subject><subject>Yarn</subject><issn>1066-6192</issn><issn>2377-5750</issn><isbn>9780769535449</isbn><isbn>0769535445</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2009</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNotjstKw0AUQAcfYK1duXQzP5B45527MyQ-CgUDtrgsN82MRiZJSYLg31vQ1eFsDoexWwGpEID3VVmlEgBTDWdsIZVziXEGztkKXQbOolFGa7xgCwHWJlagvGLX0_QFAE5LXLCHnFc0Uow-8nV3jL7z_UxzO_R8CHz-9FyW_J2-ffQz347UT2EYO76b2v6DF7syv2GXgeLkV_9csrenx23xkmxen9dFvklahDnBLIAB14DImsybmuhQn04PTbBKaRtqpYVX4aSQBWyQgnRoqCatZG2MWrK7v2rrvd8fx7aj8WevUUiUUv0COPZIpA</recordid><startdate>200902</startdate><enddate>200902</enddate><creator>Franco, J.</creator><creator>Bernabe, G.</creator><creator>Fernandez, J.</creator><creator>Acacio, M.E.</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>200902</creationdate><title>A Parallel Implementation of the 2D Wavelet Transform Using CUDA</title><author>Franco, J. ; Bernabe, G. ; Fernandez, J. ; Acacio, M.E.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i90t-98f0507d018d8e5baacb575cdf63346fb341e3fdf608f9d9af2795aba432b553</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2009</creationdate><topic>2D fast wavelet transform</topic><topic>Central Processing Unit</topic><topic>Computer architecture</topic><topic>CUDA</topic><topic>Discrete cosine transforms</topic><topic>Graphics</topic><topic>Image coding</topic><topic>Multicore processing</topic><topic>multicore processor</topic><topic>NVIDIA Tesla</topic><topic>parallel programming</topic><topic>Scientific computing</topic><topic>Video compression</topic><topic>Wavelet transforms</topic><topic>Yarn</topic><toplevel>online_resources</toplevel><creatorcontrib>Franco, J.</creatorcontrib><creatorcontrib>Bernabe, G.</creatorcontrib><creatorcontrib>Fernandez, J.</creatorcontrib><creatorcontrib>Acacio, M.E.</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Franco, J.</au><au>Bernabe, G.</au><au>Fernandez, J.</au><au>Acacio, M.E.</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>A Parallel Implementation of the 2D Wavelet Transform Using CUDA</atitle><btitle>2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing</btitle><stitle>PDP</stitle><date>2009-02</date><risdate>2009</risdate><spage>111</spage><epage>118</epage><pages>111-118</pages><issn>1066-6192</issn><eissn>2377-5750</eissn><isbn>9780769535449</isbn><isbn>0769535445</isbn><abstract>There is a multicore platform that is currently concentrating an enormous attention due to its tremendous potential in terms of sustained performance: the NVIDIA Tesla boards. These cards intended for general-purpose computing on graphic processing units (GPGPUs) are used as data-parallel computing devices. They are based on the Computed Unified Device Architecture (CUDA) which is common to the latest NVIDIA GPUs. The bottom line is a multicore platform which provides an enormous potential performance benefit driven by a non-traditional programming model. In this paper we try to provide some insight into the peculiarities of CUDA in order to target scientific computing by means of a specific example. In particular, we show that the parallelization of the two-dimensional fast wavelet transform for the NVIDIA Tesla C870 achieves a speedup of 20.8 for an image size of 8192times8192, when compared with the fastest host-only version implementation using OpenMP and including the data transfers between main memory and device memory.</abstract><pub>IEEE</pub><doi>10.1109/PDP.2009.40</doi><tpages>8</tpages></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 1066-6192 |
ispartof | 2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing, 2009, p.111-118 |
issn | 1066-6192 2377-5750 |
language | eng |
recordid | cdi_ieee_primary_4912922 |
source | IEEE Electronic Library (IEL) Conference Proceedings |
subjects | 2D fast wavelet transform Central Processing Unit Computer architecture CUDA Discrete cosine transforms Graphics Image coding Multicore processing multicore processor NVIDIA Tesla parallel programming Scientific computing Video compression Wavelet transforms Yarn |
title | A Parallel Implementation of the 2D Wavelet Transform Using CUDA |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-06T05%3A43%3A33IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=A%20Parallel%20Implementation%20of%20the%202D%20Wavelet%20Transform%20Using%20CUDA&rft.btitle=2009%2017th%20Euromicro%20International%20Conference%20on%20Parallel,%20Distributed%20and%20Network-based%20Processing&rft.au=Franco,%20J.&rft.date=2009-02&rft.spage=111&rft.epage=118&rft.pages=111-118&rft.issn=1066-6192&rft.eissn=2377-5750&rft.isbn=9780769535449&rft.isbn_list=0769535445&rft_id=info:doi/10.1109/PDP.2009.40&rft_dat=%3Cieee_6IE%3E4912922%3C/ieee_6IE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=4912922&rfr_iscdi=true |