Multifrontal Factorization of Sparse SPD Matrices on GPUs

Solving large sparse linear systems is often the most computationally intensive component of many scientific computing applications. In the past, sparse multifrontal direct factorization has been shown to scale to thousands of processors on dedicated supercomputers resulting in a substantial reducti...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	George, T., Saxena, V., Gupta, A., Singh, A., Choudhury, A. R.
Format:	Tagungsbericht
Sprache:	eng ; jpn
Schlagworte:	Computational modeling Graphics processing unit Instruction sets Kernel Libraries Sparse matrices Symmetric matrices
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	383
container_issue
container_start_page	372
container_title
container_volume
creator	George, T. Saxena, V. Gupta, A. Singh, A. Choudhury, A. R.
description	Solving large sparse linear systems is often the most computationally intensive component of many scientific computing applications. In the past, sparse multifrontal direct factorization has been shown to scale to thousands of processors on dedicated supercomputers resulting in a substantial reduction in computational time. In recent years, an alternative computing paradigm based on GPUs has gained prominence, primarily due to its affordability, power-efficiency, and the potential to achieve significant speedup relative to desktop performance on regular and structured parallel applications. However, sparse matrix factorization on GPUs has not been explored sufficiently due to the complexity involved in an efficient implementation and concerns of low GPU utilization. In this paper, we present an adaptive hybrid approach for accelerating sparse multifrontal factorization based on a judicious exploitation of the processing power of the host CPU and GPU. We present four different policies for distributing and scheduling the workload between the host CPU and the GPU, and propose a mechanism for a runtime selection of the appropriate policy for each step of sparse Cholesky factorization. This mechanism relies on auto-tuning based on modeling the best policy predictor as a parametric classifier. We estimate the classifier parameters from the available empirical computation time data such that the expected computation time is minimized. This approach is readily adaptable for using the current or an extended set of policies for different CPU-GPU combinations as well as for different combinations of dense kernels for both the CPU and the GPU.
doi_str_mv	10.1109/IPDPS.2011.44
format	Conference Proceeding
fullrecord	<record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_6012808</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>6012808</ieee_id><sourcerecordid>6012808</sourcerecordid><originalsourceid>FETCH-LOGICAL-i156t-bbe92c2a8935e8620ce16eb5843622174469099bd9ef87bbf9c083955f2723533</originalsourceid><addsrcrecordid>eNotj01LAzEURSMqWOssXbnJH5iavHy-pbS2FlocGLsuyZiBwNgpSVzor3egru7iHi7nEvLI2YJzhs_bZtW0C2CcL6S8IhUay4xGJYVV5prcc83BSmHA3JAZV4LVwIy6I1XO0TPQRhtp1Yzg_nsosU_jqbiBrl1XxhR_XYnjiY49bc8u5UDbZkX3rqTYhUynZtMc8gO57d2QQ_Wfc3JYv34s3-rd-2a7fNnVkStdau8DQgfOolDBamBd4Dp4NclpAG6k1MgQ_SeG3hrve-yYFahUDwaEEmJOni67MYRwPKf45dLPUbPp3wT-AW8XR7U</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Multifrontal Factorization of Sparse SPD Matrices on GPUs</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>George, T. ; Saxena, V. ; Gupta, A. ; Singh, A. ; Choudhury, A. R.</creator><creatorcontrib>George, T. ; Saxena, V. ; Gupta, A. ; Singh, A. ; Choudhury, A. R.</creatorcontrib><description>Solving large sparse linear systems is often the most computationally intensive component of many scientific computing applications. In the past, sparse multifrontal direct factorization has been shown to scale to thousands of processors on dedicated supercomputers resulting in a substantial reduction in computational time. In recent years, an alternative computing paradigm based on GPUs has gained prominence, primarily due to its affordability, power-efficiency, and the potential to achieve significant speedup relative to desktop performance on regular and structured parallel applications. However, sparse matrix factorization on GPUs has not been explored sufficiently due to the complexity involved in an efficient implementation and concerns of low GPU utilization. In this paper, we present an adaptive hybrid approach for accelerating sparse multifrontal factorization based on a judicious exploitation of the processing power of the host CPU and GPU. We present four different policies for distributing and scheduling the workload between the host CPU and the GPU, and propose a mechanism for a runtime selection of the appropriate policy for each step of sparse Cholesky factorization. This mechanism relies on auto-tuning based on modeling the best policy predictor as a parametric classifier. We estimate the classifier parameters from the available empirical computation time data such that the expected computation time is minimized. This approach is readily adaptable for using the current or an extended set of policies for different CPU-GPU combinations as well as for different combinations of dense kernels for both the CPU and the GPU.</description><identifier>ISSN: 1530-2075</identifier><identifier>ISBN: 1612843727</identifier><identifier>ISBN: 9781612843728</identifier><identifier>EISBN: 9780769543857</identifier><identifier>EISBN: 0769543855</identifier><identifier>DOI: 10.1109/IPDPS.2011.44</identifier><language>eng ; jpn</language><publisher>IEEE</publisher><subject>Computational modeling ; Graphics processing unit ; Instruction sets ; Kernel ; Libraries ; Sparse matrices ; Symmetric matrices</subject><ispartof>2011 IEEE International Parallel & Distributed Processing Symposium, 2011, p.372-383</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/6012808$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,2056,27923,54918</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/6012808$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>George, T.</creatorcontrib><creatorcontrib>Saxena, V.</creatorcontrib><creatorcontrib>Gupta, A.</creatorcontrib><creatorcontrib>Singh, A.</creatorcontrib><creatorcontrib>Choudhury, A. R.</creatorcontrib><title>Multifrontal Factorization of Sparse SPD Matrices on GPUs</title><title>2011 IEEE International Parallel & Distributed Processing Symposium</title><addtitle>ipdps</addtitle><description>Solving large sparse linear systems is often the most computationally intensive component of many scientific computing applications. In the past, sparse multifrontal direct factorization has been shown to scale to thousands of processors on dedicated supercomputers resulting in a substantial reduction in computational time. In recent years, an alternative computing paradigm based on GPUs has gained prominence, primarily due to its affordability, power-efficiency, and the potential to achieve significant speedup relative to desktop performance on regular and structured parallel applications. However, sparse matrix factorization on GPUs has not been explored sufficiently due to the complexity involved in an efficient implementation and concerns of low GPU utilization. In this paper, we present an adaptive hybrid approach for accelerating sparse multifrontal factorization based on a judicious exploitation of the processing power of the host CPU and GPU. We present four different policies for distributing and scheduling the workload between the host CPU and the GPU, and propose a mechanism for a runtime selection of the appropriate policy for each step of sparse Cholesky factorization. This mechanism relies on auto-tuning based on modeling the best policy predictor as a parametric classifier. We estimate the classifier parameters from the available empirical computation time data such that the expected computation time is minimized. This approach is readily adaptable for using the current or an extended set of policies for different CPU-GPU combinations as well as for different combinations of dense kernels for both the CPU and the GPU.</description><subject>Computational modeling</subject><subject>Graphics processing unit</subject><subject>Instruction sets</subject><subject>Kernel</subject><subject>Libraries</subject><subject>Sparse matrices</subject><subject>Symmetric matrices</subject><issn>1530-2075</issn><isbn>1612843727</isbn><isbn>9781612843728</isbn><isbn>9780769543857</isbn><isbn>0769543855</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2011</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNotj01LAzEURSMqWOssXbnJH5iavHy-pbS2FlocGLsuyZiBwNgpSVzor3egru7iHi7nEvLI2YJzhs_bZtW0C2CcL6S8IhUay4xGJYVV5prcc83BSmHA3JAZV4LVwIy6I1XO0TPQRhtp1Yzg_nsosU_jqbiBrl1XxhR_XYnjiY49bc8u5UDbZkX3rqTYhUynZtMc8gO57d2QQ_Wfc3JYv34s3-rd-2a7fNnVkStdau8DQgfOolDBamBd4Dp4NclpAG6k1MgQ_SeG3hrve-yYFahUDwaEEmJOni67MYRwPKf45dLPUbPp3wT-AW8XR7U</recordid><startdate>201105</startdate><enddate>201105</enddate><creator>George, T.</creator><creator>Saxena, V.</creator><creator>Gupta, A.</creator><creator>Singh, A.</creator><creator>Choudhury, A. R.</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>201105</creationdate><title>Multifrontal Factorization of Sparse SPD Matrices on GPUs</title><author>George, T. ; Saxena, V. ; Gupta, A. ; Singh, A. ; Choudhury, A. R.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i156t-bbe92c2a8935e8620ce16eb5843622174469099bd9ef87bbf9c083955f2723533</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng ; jpn</language><creationdate>2011</creationdate><topic>Computational modeling</topic><topic>Graphics processing unit</topic><topic>Instruction sets</topic><topic>Kernel</topic><topic>Libraries</topic><topic>Sparse matrices</topic><topic>Symmetric matrices</topic><toplevel>online_resources</toplevel><creatorcontrib>George, T.</creatorcontrib><creatorcontrib>Saxena, V.</creatorcontrib><creatorcontrib>Gupta, A.</creatorcontrib><creatorcontrib>Singh, A.</creatorcontrib><creatorcontrib>Choudhury, A. R.</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>George, T.</au><au>Saxena, V.</au><au>Gupta, A.</au><au>Singh, A.</au><au>Choudhury, A. R.</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Multifrontal Factorization of Sparse SPD Matrices on GPUs</atitle><btitle>2011 IEEE International Parallel & Distributed Processing Symposium</btitle><stitle>ipdps</stitle><date>2011-05</date><risdate>2011</risdate><spage>372</spage><epage>383</epage><pages>372-383</pages><issn>1530-2075</issn><isbn>1612843727</isbn><isbn>9781612843728</isbn><eisbn>9780769543857</eisbn><eisbn>0769543855</eisbn><abstract>Solving large sparse linear systems is often the most computationally intensive component of many scientific computing applications. In the past, sparse multifrontal direct factorization has been shown to scale to thousands of processors on dedicated supercomputers resulting in a substantial reduction in computational time. In recent years, an alternative computing paradigm based on GPUs has gained prominence, primarily due to its affordability, power-efficiency, and the potential to achieve significant speedup relative to desktop performance on regular and structured parallel applications. However, sparse matrix factorization on GPUs has not been explored sufficiently due to the complexity involved in an efficient implementation and concerns of low GPU utilization. In this paper, we present an adaptive hybrid approach for accelerating sparse multifrontal factorization based on a judicious exploitation of the processing power of the host CPU and GPU. We present four different policies for distributing and scheduling the workload between the host CPU and the GPU, and propose a mechanism for a runtime selection of the appropriate policy for each step of sparse Cholesky factorization. This mechanism relies on auto-tuning based on modeling the best policy predictor as a parametric classifier. We estimate the classifier parameters from the available empirical computation time data such that the expected computation time is minimized. This approach is readily adaptable for using the current or an extended set of policies for different CPU-GPU combinations as well as for different combinations of dense kernels for both the CPU and the GPU.</abstract><pub>IEEE</pub><doi>10.1109/IPDPS.2011.44</doi><tpages>12</tpages></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 1530-2075
ispartof	2011 IEEE International Parallel & Distributed Processing Symposium, 2011, p.372-383
issn	1530-2075
language	eng ; jpn
recordid	cdi_ieee_primary_6012808
source	IEEE Electronic Library (IEL) Conference Proceedings
subjects	Computational modeling Graphics processing unit Instruction sets Kernel Libraries Sparse matrices Symmetric matrices
title	Multifrontal Factorization of Sparse SPD Matrices on GPUs
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-09T15%3A28%3A31IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Multifrontal%20Factorization%20of%20Sparse%20SPD%20Matrices%20on%20GPUs&rft.btitle=2011%20IEEE%20International%20Parallel%20&%20Distributed%20Processing%20Symposium&rft.au=George,%20T.&rft.date=2011-05&rft.spage=372&rft.epage=383&rft.pages=372-383&rft.issn=1530-2075&rft.isbn=1612843727&rft.isbn_list=9781612843728&rft_id=info:doi/10.1109/IPDPS.2011.44&rft_dat=%3Cieee_6IE%3E6012808%3C/ieee_6IE%3E%3Curl%3E%3C/url%3E&rft.eisbn=9780769543857&rft.eisbn_list=0769543855&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=6012808&rfr_iscdi=true