Scaling Linear Algebra Kernels Using Remote Memory Access

This paper describes the scalability of linear algebra kernels based on remote memory access approach. The current approach differs from the other linear algebra algorithms by the explicit use of shared memory and remote memory access (RMA) communication rather than message passing. It is suitable f...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Krishnan, Manojkumar, Lewis, Robert R, Vishnu, Abhinav
Format:	Tagungsbericht
Sprache:	eng
Schlagworte:	ALGEBRA ALGORITHMS armci Clustering algorithms COMMUNICATIONS Data models GENERAL AND MISCELLANEOUS//MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE global arrays IMPLEMENTATION Kernel KERNELS Linear algebra Message passing one sided communication parallel linear algebra PARALLEL PROCESSING Protocols Remote memory access remote memory access, one-sided communication, global arrays, ARMCI Scalability
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	376
container_issue
container_start_page	369
container_title
container_volume
creator	Krishnan, Manojkumar Lewis, Robert R Vishnu, Abhinav
description	This paper describes the scalability of linear algebra kernels based on remote memory access approach. The current approach differs from the other linear algebra algorithms by the explicit use of shared memory and remote memory access (RMA) communication rather than message passing. It is suitable for clusters and scalable shared memory systems. The experimental results on large scale systems (Linux-Infiniband cluster, Cray XT) demonstrate consistent performance advantages over ScaLAPACK suite, the leading implementation of parallel linear algebra algorithms used today. For example, on a Cray XT4 for a matrix size of 102400, our RMA-based matrix multiplication achieved over 55 teraflops while ScaLAPACK's pdgemm measured close to 42 teraflops on 10000 processes.
doi_str_mv	10.1109/ICPPW.2010.57
format	Conference Proceeding
fullrecord	<record><control><sourceid>osti_6IE</sourceid><recordid>TN_cdi_ieee_primary_5599095</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>5599095</ieee_id><sourcerecordid>994036</sourcerecordid><originalsourceid>FETCH-LOGICAL-i116t-5e24abef3c400e65e37b165c586971e234953c74905ca56f872e24ce0ff64e9d3</originalsourceid><addsrcrecordid>eNotjEtPAjEUhesrEZClKzfjDxi8fdyWuyTEBxEjUYnLSal3sGYczHQ2_Htr8Gy-nJyHEJcSJlIC3Szmq9X7REH26I7EEJwlNBKdOxYDpbUq0RKciKE0yhhHcoqnYgCSoNTZnItxSl-QZVBJZwaCXoNvYrstlrFl3xWzZsubzheP3LXcpGKd_sIX_t71XDxldPtiFgKndCHOat8kHv9zJNZ3t2_zh3L5fL-Yz5ZllNL2JbIyfsO1DgaALbJ2G2kx4NSSk6y0IdTBGQIMHm09dSovAkNdW8P0oUfi-vC7S32sUog9h8-wa1sOfUVkQNvcuTp0IjNXP1389t2-QiSC_P4LqyVUlQ</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Scaling Linear Algebra Kernels Using Remote Memory Access</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Krishnan, Manojkumar ; Lewis, Robert R ; Vishnu, Abhinav</creator><creatorcontrib>Krishnan, Manojkumar ; Lewis, Robert R ; Vishnu, Abhinav ; Pacific Northwest National Lab. (PNNL), Richland, WA (United States)</creatorcontrib><description>This paper describes the scalability of linear algebra kernels based on remote memory access approach. The current approach differs from the other linear algebra algorithms by the explicit use of shared memory and remote memory access (RMA) communication rather than message passing. It is suitable for clusters and scalable shared memory systems. The experimental results on large scale systems (Linux-Infiniband cluster, Cray XT) demonstrate consistent performance advantages over ScaLAPACK suite, the leading implementation of parallel linear algebra algorithms used today. For example, on a Cray XT4 for a matrix size of 102400, our RMA-based matrix multiplication achieved over 55 teraflops while ScaLAPACK's pdgemm measured close to 42 teraflops on 10000 processes.</description><identifier>ISSN: 0190-3918</identifier><identifier>ISBN: 1424479185</identifier><identifier>ISBN: 9781424479184</identifier><identifier>EISSN: 2332-5690</identifier><identifier>EISBN: 0769541577</identifier><identifier>EISBN: 9780769541570</identifier><identifier>DOI: 10.1109/ICPPW.2010.57</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>ALGEBRA ; ALGORITHMS ; armci ; Clustering algorithms ; COMMUNICATIONS ; Data models ; GENERAL AND MISCELLANEOUS//MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE ; global arrays ; IMPLEMENTATION ; Kernel ; KERNELS ; Linear algebra ; Message passing ; one sided communication ; parallel linear algebra ; PARALLEL PROCESSING ; Protocols ; Remote memory access ; remote memory access, one-sided communication, global arrays, ARMCI ; Scalability</subject><ispartof>2010 39th International Conference on Parallel Processing Workshops, 2010, p.369-376</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/5599095$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,885,2056,27923,54918</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/5599095$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttps://www.osti.gov/biblio/994036$$D View this record in Osti.gov$$Hfree_for_read</backlink></links><search><creatorcontrib>Krishnan, Manojkumar</creatorcontrib><creatorcontrib>Lewis, Robert R</creatorcontrib><creatorcontrib>Vishnu, Abhinav</creatorcontrib><creatorcontrib>Pacific Northwest National Lab. (PNNL), Richland, WA (United States)</creatorcontrib><title>Scaling Linear Algebra Kernels Using Remote Memory Access</title><title>2010 39th International Conference on Parallel Processing Workshops</title><addtitle>icppw</addtitle><description>This paper describes the scalability of linear algebra kernels based on remote memory access approach. The current approach differs from the other linear algebra algorithms by the explicit use of shared memory and remote memory access (RMA) communication rather than message passing. It is suitable for clusters and scalable shared memory systems. The experimental results on large scale systems (Linux-Infiniband cluster, Cray XT) demonstrate consistent performance advantages over ScaLAPACK suite, the leading implementation of parallel linear algebra algorithms used today. For example, on a Cray XT4 for a matrix size of 102400, our RMA-based matrix multiplication achieved over 55 teraflops while ScaLAPACK's pdgemm measured close to 42 teraflops on 10000 processes.</description><subject>ALGEBRA</subject><subject>ALGORITHMS</subject><subject>armci</subject><subject>Clustering algorithms</subject><subject>COMMUNICATIONS</subject><subject>Data models</subject><subject>GENERAL AND MISCELLANEOUS//MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE</subject><subject>global arrays</subject><subject>IMPLEMENTATION</subject><subject>Kernel</subject><subject>KERNELS</subject><subject>Linear algebra</subject><subject>Message passing</subject><subject>one sided communication</subject><subject>parallel linear algebra</subject><subject>PARALLEL PROCESSING</subject><subject>Protocols</subject><subject>Remote memory access</subject><subject>remote memory access, one-sided communication, global arrays, ARMCI</subject><subject>Scalability</subject><issn>0190-3918</issn><issn>2332-5690</issn><isbn>1424479185</isbn><isbn>9781424479184</isbn><isbn>0769541577</isbn><isbn>9780769541570</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2010</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNotjEtPAjEUhesrEZClKzfjDxi8fdyWuyTEBxEjUYnLSal3sGYczHQ2_Htr8Gy-nJyHEJcSJlIC3Szmq9X7REH26I7EEJwlNBKdOxYDpbUq0RKciKE0yhhHcoqnYgCSoNTZnItxSl-QZVBJZwaCXoNvYrstlrFl3xWzZsubzheP3LXcpGKd_sIX_t71XDxldPtiFgKndCHOat8kHv9zJNZ3t2_zh3L5fL-Yz5ZllNL2JbIyfsO1DgaALbJ2G2kx4NSSk6y0IdTBGQIMHm09dSovAkNdW8P0oUfi-vC7S32sUog9h8-wa1sOfUVkQNvcuTp0IjNXP1389t2-QiSC_P4LqyVUlQ</recordid><startdate>201009</startdate><enddate>201009</enddate><creator>Krishnan, Manojkumar</creator><creator>Lewis, Robert R</creator><creator>Vishnu, Abhinav</creator><general>IEEE</general><general>IEEE Computer Society, Los Alamitos, CA, United States(US)</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope><scope>OTOTI</scope></search><sort><creationdate>201009</creationdate><title>Scaling Linear Algebra Kernels Using Remote Memory Access</title><author>Krishnan, Manojkumar ; Lewis, Robert R ; Vishnu, Abhinav</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i116t-5e24abef3c400e65e37b165c586971e234953c74905ca56f872e24ce0ff64e9d3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2010</creationdate><topic>ALGEBRA</topic><topic>ALGORITHMS</topic><topic>armci</topic><topic>Clustering algorithms</topic><topic>COMMUNICATIONS</topic><topic>Data models</topic><topic>GENERAL AND MISCELLANEOUS//MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE</topic><topic>global arrays</topic><topic>IMPLEMENTATION</topic><topic>Kernel</topic><topic>KERNELS</topic><topic>Linear algebra</topic><topic>Message passing</topic><topic>one sided communication</topic><topic>parallel linear algebra</topic><topic>PARALLEL PROCESSING</topic><topic>Protocols</topic><topic>Remote memory access</topic><topic>remote memory access, one-sided communication, global arrays, ARMCI</topic><topic>Scalability</topic><toplevel>online_resources</toplevel><creatorcontrib>Krishnan, Manojkumar</creatorcontrib><creatorcontrib>Lewis, Robert R</creatorcontrib><creatorcontrib>Vishnu, Abhinav</creatorcontrib><creatorcontrib>Pacific Northwest National Lab. (PNNL), Richland, WA (United States)</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection><collection>OSTI.GOV</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Krishnan, Manojkumar</au><au>Lewis, Robert R</au><au>Vishnu, Abhinav</au><aucorp>Pacific Northwest National Lab. (PNNL), Richland, WA (United States)</aucorp><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Scaling Linear Algebra Kernels Using Remote Memory Access</atitle><btitle>2010 39th International Conference on Parallel Processing Workshops</btitle><stitle>icppw</stitle><date>2010-09</date><risdate>2010</risdate><spage>369</spage><epage>376</epage><pages>369-376</pages><issn>0190-3918</issn><eissn>2332-5690</eissn><isbn>1424479185</isbn><isbn>9781424479184</isbn><eisbn>0769541577</eisbn><eisbn>9780769541570</eisbn><abstract>This paper describes the scalability of linear algebra kernels based on remote memory access approach. The current approach differs from the other linear algebra algorithms by the explicit use of shared memory and remote memory access (RMA) communication rather than message passing. It is suitable for clusters and scalable shared memory systems. The experimental results on large scale systems (Linux-Infiniband cluster, Cray XT) demonstrate consistent performance advantages over ScaLAPACK suite, the leading implementation of parallel linear algebra algorithms used today. For example, on a Cray XT4 for a matrix size of 102400, our RMA-based matrix multiplication achieved over 55 teraflops while ScaLAPACK's pdgemm measured close to 42 teraflops on 10000 processes.</abstract><cop>United States</cop><pub>IEEE</pub><doi>10.1109/ICPPW.2010.57</doi><tpages>8</tpages></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 0190-3918
ispartof	2010 39th International Conference on Parallel Processing Workshops, 2010, p.369-376
issn	0190-3918 2332-5690
language	eng
recordid	cdi_ieee_primary_5599095
source	IEEE Electronic Library (IEL) Conference Proceedings
subjects	ALGEBRA ALGORITHMS armci Clustering algorithms COMMUNICATIONS Data models GENERAL AND MISCELLANEOUS//MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE global arrays IMPLEMENTATION Kernel KERNELS Linear algebra Message passing one sided communication parallel linear algebra PARALLEL PROCESSING Protocols Remote memory access remote memory access, one-sided communication, global arrays, ARMCI Scalability
title	Scaling Linear Algebra Kernels Using Remote Memory Access
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-14T11%3A23%3A00IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-osti_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Scaling%20Linear%20Algebra%20Kernels%20Using%20Remote%20Memory%20Access&rft.btitle=2010%2039th%20International%20Conference%20on%20Parallel%20Processing%20Workshops&rft.au=Krishnan,%20Manojkumar&rft.aucorp=Pacific%20Northwest%20National%20Lab.%20(PNNL),%20Richland,%20WA%20(United%20States)&rft.date=2010-09&rft.spage=369&rft.epage=376&rft.pages=369-376&rft.issn=0190-3918&rft.eissn=2332-5690&rft.isbn=1424479185&rft.isbn_list=9781424479184&rft_id=info:doi/10.1109/ICPPW.2010.57&rft_dat=%3Costi_6IE%3E994036%3C/osti_6IE%3E%3Curl%3E%3C/url%3E&rft.eisbn=0769541577&rft.eisbn_list=9780769541570&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=5599095&rfr_iscdi=true