Compute Unified Device Architecture Application Suitability

Graphics processing units (GPUs) can provide excellent speedups on some, but not all, general-purpose workloads. Using a set of computational GPU kernels as examples, the authors show how to adapt kernels to utilize the architectural features of a GeForce 8800 GPU and what finally limits the achieva...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Computing in science & engineering 2009-05, Vol.11 (3), p.16-26
Hauptverfasser:	Hwu, Wen-Mei, Rodrigues, Christopher, Ryoo, Shane, Stratton, John
Format:	Artikel
Sprache:	eng
Schlagworte:	Architecture benchmarks Central Processing Unit Computation compute unified device architecture Computer architecture Costs CUDA Devices general-purpose computing on GPU GPGPU Graphics Hardware Kernel Kernels Multicore processing Parallel processing Phased arrays software optimization Workload
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	26
container_issue	3
container_start_page	16
container_title	Computing in science & engineering
container_volume	11
creator	Hwu, Wen-Mei Rodrigues, Christopher Ryoo, Shane Stratton, John
description	Graphics processing units (GPUs) can provide excellent speedups on some, but not all, general-purpose workloads. Using a set of computational GPU kernels as examples, the authors show how to adapt kernels to utilize the architectural features of a GeForce 8800 GPU and what finally limits the achievable performance.
doi_str_mv	10.1109/MCSE.2009.48
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_miscellaneous_1022898615</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>4814979</ieee_id><sourcerecordid>1022898615</sourcerecordid><originalsourceid>FETCH-LOGICAL-c348t-b647400f71ea87ebfa7758e1038b2143c9e0aeedc929b04bd7ddefc8f3f89e093</originalsourceid><addsrcrecordid>eNqF0M9LwzAUB_AgCs7pzZuX4smDnUmTNgmepM4fMPEwB95Cmr5iRrfWJBX235sy8eDF03uP9-HB-yJ0TvCMECxvXsrlfJZhLGdMHKAJyXOR0qJ4Pxz7jKSyIPkxOvF-jTFmQuYTdFt2m34IkKy2trFQJ_fwZQ0kd8582AAmDC4Ofd9ao4PttslysEFXtrVhd4qOGt16OPupU7R6mL-VT-ni9fG5vFukhjIR0qpgnGHccAJacKgazXkugGAqqowwaiRgDVAbmckKs6rmdQ2NEQ1tRFxJOkVX-7u96z4H8EFtrDfQtnoL3eAVKTjJOMa8-J_iLBNSxBwivfxD193gtvERJXLBKBeSRXS9R8Z13jtoVO_sRrtdvKTGzNWYuRozV0xEfrHnFgB-KROESS7pN1GCfDg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>858437894</pqid></control><display><type>article</type><title>Compute Unified Device Architecture Application Suitability</title><source>IEEE Electronic Library (IEL)</source><creator>Hwu, Wen-Mei ; Rodrigues, Christopher ; Ryoo, Shane ; Stratton, John</creator><creatorcontrib>Hwu, Wen-Mei ; Rodrigues, Christopher ; Ryoo, Shane ; Stratton, John</creatorcontrib><description>Graphics processing units (GPUs) can provide excellent speedups on some, but not all, general-purpose workloads. Using a set of computational GPU kernels as examples, the authors show how to adapt kernels to utilize the architectural features of a GeForce 8800 GPU and what finally limits the achievable performance.</description><identifier>ISSN: 1521-9615</identifier><identifier>EISSN: 1558-366X</identifier><identifier>DOI: 10.1109/MCSE.2009.48</identifier><identifier>CODEN: CSENFA</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Architecture ; benchmarks ; Central Processing Unit ; Computation ; compute unified device architecture ; Computer architecture ; Costs ; CUDA ; Devices ; general-purpose computing on GPU ; GPGPU ; Graphics ; Hardware ; Kernel ; Kernels ; Multicore processing ; Parallel processing ; Phased arrays ; software optimization ; Workload</subject><ispartof>Computing in science & engineering, 2009-05, Vol.11 (3), p.16-26</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2009</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c348t-b647400f71ea87ebfa7758e1038b2143c9e0aeedc929b04bd7ddefc8f3f89e093</citedby><cites>FETCH-LOGICAL-c348t-b647400f71ea87ebfa7758e1038b2143c9e0aeedc929b04bd7ddefc8f3f89e093</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/4814979$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/4814979$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Hwu, Wen-Mei</creatorcontrib><creatorcontrib>Rodrigues, Christopher</creatorcontrib><creatorcontrib>Ryoo, Shane</creatorcontrib><creatorcontrib>Stratton, John</creatorcontrib><title>Compute Unified Device Architecture Application Suitability</title><title>Computing in science & engineering</title><addtitle>CISE-M</addtitle><description>Graphics processing units (GPUs) can provide excellent speedups on some, but not all, general-purpose workloads. Using a set of computational GPU kernels as examples, the authors show how to adapt kernels to utilize the architectural features of a GeForce 8800 GPU and what finally limits the achievable performance.</description><subject>Architecture</subject><subject>benchmarks</subject><subject>Central Processing Unit</subject><subject>Computation</subject><subject>compute unified device architecture</subject><subject>Computer architecture</subject><subject>Costs</subject><subject>CUDA</subject><subject>Devices</subject><subject>general-purpose computing on GPU</subject><subject>GPGPU</subject><subject>Graphics</subject><subject>Hardware</subject><subject>Kernel</subject><subject>Kernels</subject><subject>Multicore processing</subject><subject>Parallel processing</subject><subject>Phased arrays</subject><subject>software optimization</subject><subject>Workload</subject><issn>1521-9615</issn><issn>1558-366X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2009</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNqF0M9LwzAUB_AgCs7pzZuX4smDnUmTNgmepM4fMPEwB95Cmr5iRrfWJBX235sy8eDF03uP9-HB-yJ0TvCMECxvXsrlfJZhLGdMHKAJyXOR0qJ4Pxz7jKSyIPkxOvF-jTFmQuYTdFt2m34IkKy2trFQJ_fwZQ0kd8582AAmDC4Ofd9ao4PttslysEFXtrVhd4qOGt16OPupU7R6mL-VT-ni9fG5vFukhjIR0qpgnGHccAJacKgazXkugGAqqowwaiRgDVAbmckKs6rmdQ2NEQ1tRFxJOkVX-7u96z4H8EFtrDfQtnoL3eAVKTjJOMa8-J_iLBNSxBwivfxD193gtvERJXLBKBeSRXS9R8Z13jtoVO_sRrtdvKTGzNWYuRozV0xEfrHnFgB-KROESS7pN1GCfDg</recordid><startdate>20090501</startdate><enddate>20090501</enddate><creator>Hwu, Wen-Mei</creator><creator>Rodrigues, Christopher</creator><creator>Ryoo, Shane</creator><creator>Stratton, John</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7TB</scope><scope>8FD</scope><scope>FR3</scope><scope>JQ2</scope><scope>KR7</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>F28</scope></search><sort><creationdate>20090501</creationdate><title>Compute Unified Device Architecture Application Suitability</title><author>Hwu, Wen-Mei ; Rodrigues, Christopher ; Ryoo, Shane ; Stratton, John</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c348t-b647400f71ea87ebfa7758e1038b2143c9e0aeedc929b04bd7ddefc8f3f89e093</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2009</creationdate><topic>Architecture</topic><topic>benchmarks</topic><topic>Central Processing Unit</topic><topic>Computation</topic><topic>compute unified device architecture</topic><topic>Computer architecture</topic><topic>Costs</topic><topic>CUDA</topic><topic>Devices</topic><topic>general-purpose computing on GPU</topic><topic>GPGPU</topic><topic>Graphics</topic><topic>Hardware</topic><topic>Kernel</topic><topic>Kernels</topic><topic>Multicore processing</topic><topic>Parallel processing</topic><topic>Phased arrays</topic><topic>software optimization</topic><topic>Workload</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Hwu, Wen-Mei</creatorcontrib><creatorcontrib>Rodrigues, Christopher</creatorcontrib><creatorcontrib>Ryoo, Shane</creatorcontrib><creatorcontrib>Stratton, John</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Mechanical & Transportation Engineering Abstracts</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Civil Engineering Abstracts</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>ANTE: Abstracts in New Technology & Engineering</collection><jtitle>Computing in science & engineering</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Hwu, Wen-Mei</au><au>Rodrigues, Christopher</au><au>Ryoo, Shane</au><au>Stratton, John</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Compute Unified Device Architecture Application Suitability</atitle><jtitle>Computing in science & engineering</jtitle><stitle>CISE-M</stitle><date>2009-05-01</date><risdate>2009</risdate><volume>11</volume><issue>3</issue><spage>16</spage><epage>26</epage><pages>16-26</pages><issn>1521-9615</issn><eissn>1558-366X</eissn><coden>CSENFA</coden><abstract>Graphics processing units (GPUs) can provide excellent speedups on some, but not all, general-purpose workloads. Using a set of computational GPU kernels as examples, the authors show how to adapt kernels to utilize the architectural features of a GeForce 8800 GPU and what finally limits the achievable performance.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/MCSE.2009.48</doi><tpages>11</tpages></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 1521-9615
ispartof	Computing in science & engineering, 2009-05, Vol.11 (3), p.16-26
issn	1521-9615 1558-366X
language	eng
recordid	cdi_proquest_miscellaneous_1022898615
source	IEEE Electronic Library (IEL)
subjects	Architecture benchmarks Central Processing Unit Computation compute unified device architecture Computer architecture Costs CUDA Devices general-purpose computing on GPU GPGPU Graphics Hardware Kernel Kernels Multicore processing Parallel processing Phased arrays software optimization Workload
title	Compute Unified Device Architecture Application Suitability
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-19T05%3A57%3A43IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Compute%20Unified%20Device%20Architecture%20Application%20Suitability&rft.jtitle=Computing%20in%20science%20&%20engineering&rft.au=Hwu,%20Wen-Mei&rft.date=2009-05-01&rft.volume=11&rft.issue=3&rft.spage=16&rft.epage=26&rft.pages=16-26&rft.issn=1521-9615&rft.eissn=1558-366X&rft.coden=CSENFA&rft_id=info:doi/10.1109/MCSE.2009.48&rft_dat=%3Cproquest_RIE%3E1022898615%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=858437894&rft_id=info:pmid/&rft_ieee_id=4814979&rfr_iscdi=true