Batched matrix computations on hardware accelerators based on GPUs
Scientific applications require solvers that work on many small size problems that are independent from each other. At the same time, the high-end hardware evolves rapidly and becomes ever more throughput-oriented and thus there is an increasing need for an effective approach to develop energy-effic...
Gespeichert in:
Veröffentlicht in: | The international journal of high performance computing applications 2015-06, Vol.29 (2), p.193 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | 2 |
container_start_page | 193 |
container_title | The international journal of high performance computing applications |
container_volume | 29 |
creator | Haidar, Azzam Dong, Tingxing Luszczek, Piotr Tomov, Stanimire Dongarra, Jack |
description | Scientific applications require solvers that work on many small size problems that are independent from each other. At the same time, the high-end hardware evolves rapidly and becomes ever more throughput-oriented and thus there is an increasing need for an effective approach to develop energy-efficient, high-performance codes for these small matrix problems that the authors call batched factorizations. The many applications that need this functionality could especially benefit from the use of GPUs, which currently are four to five times more energy efficient than multicore CPUs on important scientific workloads. This paper, consequently, describes the development of the most common, one-sided factorizations, Cholesky, LU, and QR, for a set of small dense matrices. The algorithms the authors present together with their implementations are, by design, inherently parallel. Their approach is more efficient than what works for a combination of multicore CPUs and GPUs for the problems sizes of interest of the application use cases. |
doi_str_mv | 10.1177/1094342014567546 |
format | Article |
fullrecord | <record><control><sourceid>proquest_osti_</sourceid><recordid>TN_cdi_osti_scitechconnect_1361289</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3687444591</sourcerecordid><originalsourceid>FETCH-LOGICAL-o252t-1c79c6492455d05a39b2e2764329e57df4c4edc78022d0ec91679309a46433883</originalsourceid><addsrcrecordid>eNotjk1LAzEURYMoWKt7l0HXo_nOZGmLVqGgC7se0pdXOqWd1CSD_nwjdXUfnPMul5Bbzh44t_aRM6ekEowrbaxW5oxMuFW8Ea0y5_WuuPnjl-Qq5x1jzCipJ2Q28wW2GOjBl9T_UIiH41h86eOQaRzo1qfw7RNSD4B7TL7ElOna5_pS8eJjla_JxcbvM97855SsXp4_56_N8n3xNn9aNlFoURoO1oFRTiitA9NeurVAYesM4VDbsFGgMIBtmRCBIThurJPMeVUV2bZySu5OvTGXvsvQF4QtxGFAKB2XhovWVen-JB1T_Boxl24XxzTUXR03bVWkFEz-Au58Vjk</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1681283320</pqid></control><display><type>article</type><title>Batched matrix computations on hardware accelerators based on GPUs</title><source>Access via SAGE</source><source>Alma/SFX Local Collection</source><creator>Haidar, Azzam ; Dong, Tingxing ; Luszczek, Piotr ; Tomov, Stanimire ; Dongarra, Jack</creator><creatorcontrib>Haidar, Azzam ; Dong, Tingxing ; Luszczek, Piotr ; Tomov, Stanimire ; Dongarra, Jack ; Univ. of Tennessee, Knoxville, TN (United States) ; Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)</creatorcontrib><description>Scientific applications require solvers that work on many small size problems that are independent from each other. At the same time, the high-end hardware evolves rapidly and becomes ever more throughput-oriented and thus there is an increasing need for an effective approach to develop energy-efficient, high-performance codes for these small matrix problems that the authors call batched factorizations. The many applications that need this functionality could especially benefit from the use of GPUs, which currently are four to five times more energy efficient than multicore CPUs on important scientific workloads. This paper, consequently, describes the development of the most common, one-sided factorizations, Cholesky, LU, and QR, for a set of small dense matrices. The algorithms the authors present together with their implementations are, by design, inherently parallel. Their approach is more efficient than what works for a combination of multicore CPUs and GPUs for the problems sizes of interest of the application use cases.</description><identifier>ISSN: 1094-3420</identifier><identifier>EISSN: 1741-2846</identifier><identifier>DOI: 10.1177/1094342014567546</identifier><language>eng</language><publisher>London: SAGE PUBLICATIONS, INC</publisher><subject>Algorithms ; batched factorization ; Computer peripherals ; hardware accelerators ; High performance computing ; Integrated circuits ; MATHEMATICS AND COMPUTING ; numerical linear algebra ; numerical software libraries ; one-sided factorization algorithms ; Problem solving ; Studies</subject><ispartof>The international journal of high performance computing applications, 2015-06, Vol.29 (2), p.193</ispartof><rights>Copyright SAGE PUBLICATIONS, INC. May 2015</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>230,314,780,784,885,27924,27925</link.rule.ids><backlink>$$Uhttps://www.osti.gov/servlets/purl/1361289$$D View this record in Osti.gov$$Hfree_for_read</backlink></links><search><creatorcontrib>Haidar, Azzam</creatorcontrib><creatorcontrib>Dong, Tingxing</creatorcontrib><creatorcontrib>Luszczek, Piotr</creatorcontrib><creatorcontrib>Tomov, Stanimire</creatorcontrib><creatorcontrib>Dongarra, Jack</creatorcontrib><creatorcontrib>Univ. of Tennessee, Knoxville, TN (United States)</creatorcontrib><creatorcontrib>Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)</creatorcontrib><title>Batched matrix computations on hardware accelerators based on GPUs</title><title>The international journal of high performance computing applications</title><description>Scientific applications require solvers that work on many small size problems that are independent from each other. At the same time, the high-end hardware evolves rapidly and becomes ever more throughput-oriented and thus there is an increasing need for an effective approach to develop energy-efficient, high-performance codes for these small matrix problems that the authors call batched factorizations. The many applications that need this functionality could especially benefit from the use of GPUs, which currently are four to five times more energy efficient than multicore CPUs on important scientific workloads. This paper, consequently, describes the development of the most common, one-sided factorizations, Cholesky, LU, and QR, for a set of small dense matrices. The algorithms the authors present together with their implementations are, by design, inherently parallel. Their approach is more efficient than what works for a combination of multicore CPUs and GPUs for the problems sizes of interest of the application use cases.</description><subject>Algorithms</subject><subject>batched factorization</subject><subject>Computer peripherals</subject><subject>hardware accelerators</subject><subject>High performance computing</subject><subject>Integrated circuits</subject><subject>MATHEMATICS AND COMPUTING</subject><subject>numerical linear algebra</subject><subject>numerical software libraries</subject><subject>one-sided factorization algorithms</subject><subject>Problem solving</subject><subject>Studies</subject><issn>1094-3420</issn><issn>1741-2846</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2015</creationdate><recordtype>article</recordtype><recordid>eNotjk1LAzEURYMoWKt7l0HXo_nOZGmLVqGgC7se0pdXOqWd1CSD_nwjdXUfnPMul5Bbzh44t_aRM6ekEowrbaxW5oxMuFW8Ea0y5_WuuPnjl-Qq5x1jzCipJ2Q28wW2GOjBl9T_UIiH41h86eOQaRzo1qfw7RNSD4B7TL7ElOna5_pS8eJjla_JxcbvM97855SsXp4_56_N8n3xNn9aNlFoURoO1oFRTiitA9NeurVAYesM4VDbsFGgMIBtmRCBIThurJPMeVUV2bZySu5OvTGXvsvQF4QtxGFAKB2XhovWVen-JB1T_Boxl24XxzTUXR03bVWkFEz-Au58Vjk</recordid><startdate>20150601</startdate><enddate>20150601</enddate><creator>Haidar, Azzam</creator><creator>Dong, Tingxing</creator><creator>Luszczek, Piotr</creator><creator>Tomov, Stanimire</creator><creator>Dongarra, Jack</creator><general>SAGE PUBLICATIONS, INC</general><general>SAGE</general><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>OIOZB</scope><scope>OTOTI</scope></search><sort><creationdate>20150601</creationdate><title>Batched matrix computations on hardware accelerators based on GPUs</title><author>Haidar, Azzam ; Dong, Tingxing ; Luszczek, Piotr ; Tomov, Stanimire ; Dongarra, Jack</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-o252t-1c79c6492455d05a39b2e2764329e57df4c4edc78022d0ec91679309a46433883</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2015</creationdate><topic>Algorithms</topic><topic>batched factorization</topic><topic>Computer peripherals</topic><topic>hardware accelerators</topic><topic>High performance computing</topic><topic>Integrated circuits</topic><topic>MATHEMATICS AND COMPUTING</topic><topic>numerical linear algebra</topic><topic>numerical software libraries</topic><topic>one-sided factorization algorithms</topic><topic>Problem solving</topic><topic>Studies</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Haidar, Azzam</creatorcontrib><creatorcontrib>Dong, Tingxing</creatorcontrib><creatorcontrib>Luszczek, Piotr</creatorcontrib><creatorcontrib>Tomov, Stanimire</creatorcontrib><creatorcontrib>Dongarra, Jack</creatorcontrib><creatorcontrib>Univ. of Tennessee, Knoxville, TN (United States)</creatorcontrib><creatorcontrib>Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)</creatorcontrib><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>OSTI.GOV - Hybrid</collection><collection>OSTI.GOV</collection><jtitle>The international journal of high performance computing applications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Haidar, Azzam</au><au>Dong, Tingxing</au><au>Luszczek, Piotr</au><au>Tomov, Stanimire</au><au>Dongarra, Jack</au><aucorp>Univ. of Tennessee, Knoxville, TN (United States)</aucorp><aucorp>Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)</aucorp><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Batched matrix computations on hardware accelerators based on GPUs</atitle><jtitle>The international journal of high performance computing applications</jtitle><date>2015-06-01</date><risdate>2015</risdate><volume>29</volume><issue>2</issue><spage>193</spage><pages>193-</pages><issn>1094-3420</issn><eissn>1741-2846</eissn><abstract>Scientific applications require solvers that work on many small size problems that are independent from each other. At the same time, the high-end hardware evolves rapidly and becomes ever more throughput-oriented and thus there is an increasing need for an effective approach to develop energy-efficient, high-performance codes for these small matrix problems that the authors call batched factorizations. The many applications that need this functionality could especially benefit from the use of GPUs, which currently are four to five times more energy efficient than multicore CPUs on important scientific workloads. This paper, consequently, describes the development of the most common, one-sided factorizations, Cholesky, LU, and QR, for a set of small dense matrices. The algorithms the authors present together with their implementations are, by design, inherently parallel. Their approach is more efficient than what works for a combination of multicore CPUs and GPUs for the problems sizes of interest of the application use cases.</abstract><cop>London</cop><pub>SAGE PUBLICATIONS, INC</pub><doi>10.1177/1094342014567546</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1094-3420 |
ispartof | The international journal of high performance computing applications, 2015-06, Vol.29 (2), p.193 |
issn | 1094-3420 1741-2846 |
language | eng |
recordid | cdi_osti_scitechconnect_1361289 |
source | Access via SAGE; Alma/SFX Local Collection |
subjects | Algorithms batched factorization Computer peripherals hardware accelerators High performance computing Integrated circuits MATHEMATICS AND COMPUTING numerical linear algebra numerical software libraries one-sided factorization algorithms Problem solving Studies |
title | Batched matrix computations on hardware accelerators based on GPUs |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-20T10%3A20%3A33IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_osti_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Batched%20matrix%20computations%20on%20hardware%20accelerators%20based%20on%20GPUs&rft.jtitle=The%20international%20journal%20of%20high%20performance%20computing%20applications&rft.au=Haidar,%20Azzam&rft.aucorp=Univ.%20of%20Tennessee,%20Knoxville,%20TN%20(United%20States)&rft.date=2015-06-01&rft.volume=29&rft.issue=2&rft.spage=193&rft.pages=193-&rft.issn=1094-3420&rft.eissn=1741-2846&rft_id=info:doi/10.1177/1094342014567546&rft_dat=%3Cproquest_osti_%3E3687444591%3C/proquest_osti_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1681283320&rft_id=info:pmid/&rfr_iscdi=true |