A Parallel Implementation of the 2D Wavelet Transform Using CUDA

There is a multicore platform that is currently concentrating an enormous attention due to its tremendous potential in terms of sustained performance: the NVIDIA Tesla boards. These cards intended for general-purpose computing on graphic processing units (GPGPUs) are used as data-parallel computing...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Franco, J., Bernabe, G., Fernandez, J., Acacio, M.E.
Format:	Tagungsbericht
Sprache:	eng
Schlagworte:	2D fast wavelet transform Central Processing Unit Computer architecture CUDA Discrete cosine transforms Graphics Image coding Multicore processing multicore processor NVIDIA Tesla parallel programming Scientific computing Video compression Wavelet transforms Yarn
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	118
container_issue
container_start_page	111
container_title
container_volume
creator	Franco, J. Bernabe, G. Fernandez, J. Acacio, M.E.
description	There is a multicore platform that is currently concentrating an enormous attention due to its tremendous potential in terms of sustained performance: the NVIDIA Tesla boards. These cards intended for general-purpose computing on graphic processing units (GPGPUs) are used as data-parallel computing devices. They are based on the Computed Unified Device Architecture (CUDA) which is common to the latest NVIDIA GPUs. The bottom line is a multicore platform which provides an enormous potential performance benefit driven by a non-traditional programming model. In this paper we try to provide some insight into the peculiarities of CUDA in order to target scientific computing by means of a specific example. In particular, we show that the parallelization of the two-dimensional fast wavelet transform for the NVIDIA Tesla C870 achieves a speedup of 20.8 for an image size of 8192times8192, when compared with the fastest host-only version implementation using OpenMP and including the data transfers between main memory and device memory.
doi_str_mv	10.1109/PDP.2009.40
format	Conference Proceeding
fullrecord	<record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_4912922</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>4912922</ieee_id><sourcerecordid>4912922</sourcerecordid><originalsourceid>FETCH-LOGICAL-i90t-98f0507d018d8e5baacb575cdf63346fb341e3fdf608f9d9af2795aba432b553</originalsourceid><addsrcrecordid>eNotjstKw0AUQAcfYK1duXQzP5B45527MyQ-CgUDtrgsN82MRiZJSYLg31vQ1eFsDoexWwGpEID3VVmlEgBTDWdsIZVziXEGztkKXQbOolFGa7xgCwHWJlagvGLX0_QFAE5LXLCHnFc0Uow-8nV3jL7z_UxzO_R8CHz-9FyW_J2-ffQz347UT2EYO76b2v6DF7syv2GXgeLkV_9csrenx23xkmxen9dFvklahDnBLIAB14DImsybmuhQn04PTbBKaRtqpYVX4aSQBWyQgnRoqCatZG2MWrK7v2rrvd8fx7aj8WevUUiUUv0COPZIpA</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>A Parallel Implementation of the 2D Wavelet Transform Using CUDA</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Franco, J. ; Bernabe, G. ; Fernandez, J. ; Acacio, M.E.</creator><creatorcontrib>Franco, J. ; Bernabe, G. ; Fernandez, J. ; Acacio, M.E.</creatorcontrib><description>There is a multicore platform that is currently concentrating an enormous attention due to its tremendous potential in terms of sustained performance: the NVIDIA Tesla boards. These cards intended for general-purpose computing on graphic processing units (GPGPUs) are used as data-parallel computing devices. They are based on the Computed Unified Device Architecture (CUDA) which is common to the latest NVIDIA GPUs. The bottom line is a multicore platform which provides an enormous potential performance benefit driven by a non-traditional programming model. In this paper we try to provide some insight into the peculiarities of CUDA in order to target scientific computing by means of a specific example. In particular, we show that the parallelization of the two-dimensional fast wavelet transform for the NVIDIA Tesla C870 achieves a speedup of 20.8 for an image size of 8192times8192, when compared with the fastest host-only version implementation using OpenMP and including the data transfers between main memory and device memory.</description><identifier>ISSN: 1066-6192</identifier><identifier>ISBN: 9780769535449</identifier><identifier>ISBN: 0769535445</identifier><identifier>EISSN: 2377-5750</identifier><identifier>DOI: 10.1109/PDP.2009.40</identifier><language>eng</language><publisher>IEEE</publisher><subject>2D fast wavelet transform ; Central Processing Unit ; Computer architecture ; CUDA ; Discrete cosine transforms ; Graphics ; Image coding ; Multicore processing ; multicore processor ; NVIDIA Tesla ; parallel programming ; Scientific computing ; Video compression ; Wavelet transforms ; Yarn</subject><ispartof>2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing, 2009, p.111-118</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/4912922$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,2058,27925,54920</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/4912922$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Franco, J.</creatorcontrib><creatorcontrib>Bernabe, G.</creatorcontrib><creatorcontrib>Fernandez, J.</creatorcontrib><creatorcontrib>Acacio, M.E.</creatorcontrib><title>A Parallel Implementation of the 2D Wavelet Transform Using CUDA</title><title>2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing</title><addtitle>PDP</addtitle><description>There is a multicore platform that is currently concentrating an enormous attention due to its tremendous potential in terms of sustained performance: the NVIDIA Tesla boards. These cards intended for general-purpose computing on graphic processing units (GPGPUs) are used as data-parallel computing devices. They are based on the Computed Unified Device Architecture (CUDA) which is common to the latest NVIDIA GPUs. The bottom line is a multicore platform which provides an enormous potential performance benefit driven by a non-traditional programming model. In this paper we try to provide some insight into the peculiarities of CUDA in order to target scientific computing by means of a specific example. In particular, we show that the parallelization of the two-dimensional fast wavelet transform for the NVIDIA Tesla C870 achieves a speedup of 20.8 for an image size of 8192times8192, when compared with the fastest host-only version implementation using OpenMP and including the data transfers between main memory and device memory.</description><subject>2D fast wavelet transform</subject><subject>Central Processing Unit</subject><subject>Computer architecture</subject><subject>CUDA</subject><subject>Discrete cosine transforms</subject><subject>Graphics</subject><subject>Image coding</subject><subject>Multicore processing</subject><subject>multicore processor</subject><subject>NVIDIA Tesla</subject><subject>parallel programming</subject><subject>Scientific computing</subject><subject>Video compression</subject><subject>Wavelet transforms</subject><subject>Yarn</subject><issn>1066-6192</issn><issn>2377-5750</issn><isbn>9780769535449</isbn><isbn>0769535445</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2009</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNotjstKw0AUQAcfYK1duXQzP5B45527MyQ-CgUDtrgsN82MRiZJSYLg31vQ1eFsDoexWwGpEID3VVmlEgBTDWdsIZVziXEGztkKXQbOolFGa7xgCwHWJlagvGLX0_QFAE5LXLCHnFc0Uow-8nV3jL7z_UxzO_R8CHz-9FyW_J2-ffQz347UT2EYO76b2v6DF7syv2GXgeLkV_9csrenx23xkmxen9dFvklahDnBLIAB14DImsybmuhQn04PTbBKaRtqpYVX4aSQBWyQgnRoqCatZG2MWrK7v2rrvd8fx7aj8WevUUiUUv0COPZIpA</recordid><startdate>200902</startdate><enddate>200902</enddate><creator>Franco, J.</creator><creator>Bernabe, G.</creator><creator>Fernandez, J.</creator><creator>Acacio, M.E.</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>200902</creationdate><title>A Parallel Implementation of the 2D Wavelet Transform Using CUDA</title><author>Franco, J. ; Bernabe, G. ; Fernandez, J. ; Acacio, M.E.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i90t-98f0507d018d8e5baacb575cdf63346fb341e3fdf608f9d9af2795aba432b553</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2009</creationdate><topic>2D fast wavelet transform</topic><topic>Central Processing Unit</topic><topic>Computer architecture</topic><topic>CUDA</topic><topic>Discrete cosine transforms</topic><topic>Graphics</topic><topic>Image coding</topic><topic>Multicore processing</topic><topic>multicore processor</topic><topic>NVIDIA Tesla</topic><topic>parallel programming</topic><topic>Scientific computing</topic><topic>Video compression</topic><topic>Wavelet transforms</topic><topic>Yarn</topic><toplevel>online_resources</toplevel><creatorcontrib>Franco, J.</creatorcontrib><creatorcontrib>Bernabe, G.</creatorcontrib><creatorcontrib>Fernandez, J.</creatorcontrib><creatorcontrib>Acacio, M.E.</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Franco, J.</au><au>Bernabe, G.</au><au>Fernandez, J.</au><au>Acacio, M.E.</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>A Parallel Implementation of the 2D Wavelet Transform Using CUDA</atitle><btitle>2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing</btitle><stitle>PDP</stitle><date>2009-02</date><risdate>2009</risdate><spage>111</spage><epage>118</epage><pages>111-118</pages><issn>1066-6192</issn><eissn>2377-5750</eissn><isbn>9780769535449</isbn><isbn>0769535445</isbn><abstract>There is a multicore platform that is currently concentrating an enormous attention due to its tremendous potential in terms of sustained performance: the NVIDIA Tesla boards. These cards intended for general-purpose computing on graphic processing units (GPGPUs) are used as data-parallel computing devices. They are based on the Computed Unified Device Architecture (CUDA) which is common to the latest NVIDIA GPUs. The bottom line is a multicore platform which provides an enormous potential performance benefit driven by a non-traditional programming model. In this paper we try to provide some insight into the peculiarities of CUDA in order to target scientific computing by means of a specific example. In particular, we show that the parallelization of the two-dimensional fast wavelet transform for the NVIDIA Tesla C870 achieves a speedup of 20.8 for an image size of 8192times8192, when compared with the fastest host-only version implementation using OpenMP and including the data transfers between main memory and device memory.</abstract><pub>IEEE</pub><doi>10.1109/PDP.2009.40</doi><tpages>8</tpages></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 1066-6192
ispartof	2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing, 2009, p.111-118
issn	1066-6192 2377-5750
language	eng
recordid	cdi_ieee_primary_4912922
source	IEEE Electronic Library (IEL) Conference Proceedings
subjects	2D fast wavelet transform Central Processing Unit Computer architecture CUDA Discrete cosine transforms Graphics Image coding Multicore processing multicore processor NVIDIA Tesla parallel programming Scientific computing Video compression Wavelet transforms Yarn
title	A Parallel Implementation of the 2D Wavelet Transform Using CUDA
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-06T05%3A43%3A33IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=A%20Parallel%20Implementation%20of%20the%202D%20Wavelet%20Transform%20Using%20CUDA&rft.btitle=2009%2017th%20Euromicro%20International%20Conference%20on%20Parallel,%20Distributed%20and%20Network-based%20Processing&rft.au=Franco,%20J.&rft.date=2009-02&rft.spage=111&rft.epage=118&rft.pages=111-118&rft.issn=1066-6192&rft.eissn=2377-5750&rft.isbn=9780769535449&rft.isbn_list=0769535445&rft_id=info:doi/10.1109/PDP.2009.40&rft_dat=%3Cieee_6IE%3E4912922%3C/ieee_6IE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=4912922&rfr_iscdi=true