Scaling Linear Algebra Kernels Using Remote Memory Access

This paper describes the scalability of linear algebra kernels based on remote memory access approach. The current approach differs from the other linear algebra algorithms by the explicit use of shared memory and remote memory access (RMA) communication rather than message passing. It is suitable f...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Krishnan, Manojkumar, Lewis, Robert R, Vishnu, Abhinav
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 376
container_issue
container_start_page 369
container_title
container_volume
creator Krishnan, Manojkumar
Lewis, Robert R
Vishnu, Abhinav
description This paper describes the scalability of linear algebra kernels based on remote memory access approach. The current approach differs from the other linear algebra algorithms by the explicit use of shared memory and remote memory access (RMA) communication rather than message passing. It is suitable for clusters and scalable shared memory systems. The experimental results on large scale systems (Linux-Infiniband cluster, Cray XT) demonstrate consistent performance advantages over ScaLAPACK suite, the leading implementation of parallel linear algebra algorithms used today. For example, on a Cray XT4 for a matrix size of 102400, our RMA-based matrix multiplication achieved over 55 teraflops while ScaLAPACK's pdgemm measured close to 42 teraflops on 10000 processes.
doi_str_mv 10.1109/ICPPW.2010.57
format Conference Proceeding
fullrecord <record><control><sourceid>osti_6IE</sourceid><recordid>TN_cdi_ieee_primary_5599095</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>5599095</ieee_id><sourcerecordid>994036</sourcerecordid><originalsourceid>FETCH-LOGICAL-i116t-5e24abef3c400e65e37b165c586971e234953c74905ca56f872e24ce0ff64e9d3</originalsourceid><addsrcrecordid>eNotjEtPAjEUhesrEZClKzfjDxi8fdyWuyTEBxEjUYnLSal3sGYczHQ2_Htr8Gy-nJyHEJcSJlIC3Szmq9X7REH26I7EEJwlNBKdOxYDpbUq0RKciKE0yhhHcoqnYgCSoNTZnItxSl-QZVBJZwaCXoNvYrstlrFl3xWzZsubzheP3LXcpGKd_sIX_t71XDxldPtiFgKndCHOat8kHv9zJNZ3t2_zh3L5fL-Yz5ZllNL2JbIyfsO1DgaALbJ2G2kx4NSSk6y0IdTBGQIMHm09dSovAkNdW8P0oUfi-vC7S32sUog9h8-wa1sOfUVkQNvcuTp0IjNXP1389t2-QiSC_P4LqyVUlQ</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Scaling Linear Algebra Kernels Using Remote Memory Access</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Krishnan, Manojkumar ; Lewis, Robert R ; Vishnu, Abhinav</creator><creatorcontrib>Krishnan, Manojkumar ; Lewis, Robert R ; Vishnu, Abhinav ; Pacific Northwest National Lab. (PNNL), Richland, WA (United States)</creatorcontrib><description>This paper describes the scalability of linear algebra kernels based on remote memory access approach. The current approach differs from the other linear algebra algorithms by the explicit use of shared memory and remote memory access (RMA) communication rather than message passing. It is suitable for clusters and scalable shared memory systems. The experimental results on large scale systems (Linux-Infiniband cluster, Cray XT) demonstrate consistent performance advantages over ScaLAPACK suite, the leading implementation of parallel linear algebra algorithms used today. For example, on a Cray XT4 for a matrix size of 102400, our RMA-based matrix multiplication achieved over 55 teraflops while ScaLAPACK's pdgemm measured close to 42 teraflops on 10000 processes.</description><identifier>ISSN: 0190-3918</identifier><identifier>ISBN: 1424479185</identifier><identifier>ISBN: 9781424479184</identifier><identifier>EISSN: 2332-5690</identifier><identifier>EISBN: 0769541577</identifier><identifier>EISBN: 9780769541570</identifier><identifier>DOI: 10.1109/ICPPW.2010.57</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>ALGEBRA ; ALGORITHMS ; armci ; Clustering algorithms ; COMMUNICATIONS ; Data models ; GENERAL AND MISCELLANEOUS//MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE ; global arrays ; IMPLEMENTATION ; Kernel ; KERNELS ; Linear algebra ; Message passing ; one sided communication ; parallel linear algebra ; PARALLEL PROCESSING ; Protocols ; Remote memory access ; remote memory access, one-sided communication, global arrays, ARMCI ; Scalability</subject><ispartof>2010 39th International Conference on Parallel Processing Workshops, 2010, p.369-376</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/5599095$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,885,2056,27923,54918</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/5599095$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttps://www.osti.gov/biblio/994036$$D View this record in Osti.gov$$Hfree_for_read</backlink></links><search><creatorcontrib>Krishnan, Manojkumar</creatorcontrib><creatorcontrib>Lewis, Robert R</creatorcontrib><creatorcontrib>Vishnu, Abhinav</creatorcontrib><creatorcontrib>Pacific Northwest National Lab. (PNNL), Richland, WA (United States)</creatorcontrib><title>Scaling Linear Algebra Kernels Using Remote Memory Access</title><title>2010 39th International Conference on Parallel Processing Workshops</title><addtitle>icppw</addtitle><description>This paper describes the scalability of linear algebra kernels based on remote memory access approach. The current approach differs from the other linear algebra algorithms by the explicit use of shared memory and remote memory access (RMA) communication rather than message passing. It is suitable for clusters and scalable shared memory systems. The experimental results on large scale systems (Linux-Infiniband cluster, Cray XT) demonstrate consistent performance advantages over ScaLAPACK suite, the leading implementation of parallel linear algebra algorithms used today. For example, on a Cray XT4 for a matrix size of 102400, our RMA-based matrix multiplication achieved over 55 teraflops while ScaLAPACK's pdgemm measured close to 42 teraflops on 10000 processes.</description><subject>ALGEBRA</subject><subject>ALGORITHMS</subject><subject>armci</subject><subject>Clustering algorithms</subject><subject>COMMUNICATIONS</subject><subject>Data models</subject><subject>GENERAL AND MISCELLANEOUS//MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE</subject><subject>global arrays</subject><subject>IMPLEMENTATION</subject><subject>Kernel</subject><subject>KERNELS</subject><subject>Linear algebra</subject><subject>Message passing</subject><subject>one sided communication</subject><subject>parallel linear algebra</subject><subject>PARALLEL PROCESSING</subject><subject>Protocols</subject><subject>Remote memory access</subject><subject>remote memory access, one-sided communication, global arrays, ARMCI</subject><subject>Scalability</subject><issn>0190-3918</issn><issn>2332-5690</issn><isbn>1424479185</isbn><isbn>9781424479184</isbn><isbn>0769541577</isbn><isbn>9780769541570</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2010</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNotjEtPAjEUhesrEZClKzfjDxi8fdyWuyTEBxEjUYnLSal3sGYczHQ2_Htr8Gy-nJyHEJcSJlIC3Szmq9X7REH26I7EEJwlNBKdOxYDpbUq0RKciKE0yhhHcoqnYgCSoNTZnItxSl-QZVBJZwaCXoNvYrstlrFl3xWzZsubzheP3LXcpGKd_sIX_t71XDxldPtiFgKndCHOat8kHv9zJNZ3t2_zh3L5fL-Yz5ZllNL2JbIyfsO1DgaALbJ2G2kx4NSSk6y0IdTBGQIMHm09dSovAkNdW8P0oUfi-vC7S32sUog9h8-wa1sOfUVkQNvcuTp0IjNXP1389t2-QiSC_P4LqyVUlQ</recordid><startdate>201009</startdate><enddate>201009</enddate><creator>Krishnan, Manojkumar</creator><creator>Lewis, Robert R</creator><creator>Vishnu, Abhinav</creator><general>IEEE</general><general>IEEE Computer Society, Los Alamitos, CA, United States(US)</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope><scope>OTOTI</scope></search><sort><creationdate>201009</creationdate><title>Scaling Linear Algebra Kernels Using Remote Memory Access</title><author>Krishnan, Manojkumar ; Lewis, Robert R ; Vishnu, Abhinav</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i116t-5e24abef3c400e65e37b165c586971e234953c74905ca56f872e24ce0ff64e9d3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2010</creationdate><topic>ALGEBRA</topic><topic>ALGORITHMS</topic><topic>armci</topic><topic>Clustering algorithms</topic><topic>COMMUNICATIONS</topic><topic>Data models</topic><topic>GENERAL AND MISCELLANEOUS//MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE</topic><topic>global arrays</topic><topic>IMPLEMENTATION</topic><topic>Kernel</topic><topic>KERNELS</topic><topic>Linear algebra</topic><topic>Message passing</topic><topic>one sided communication</topic><topic>parallel linear algebra</topic><topic>PARALLEL PROCESSING</topic><topic>Protocols</topic><topic>Remote memory access</topic><topic>remote memory access, one-sided communication, global arrays, ARMCI</topic><topic>Scalability</topic><toplevel>online_resources</toplevel><creatorcontrib>Krishnan, Manojkumar</creatorcontrib><creatorcontrib>Lewis, Robert R</creatorcontrib><creatorcontrib>Vishnu, Abhinav</creatorcontrib><creatorcontrib>Pacific Northwest National Lab. (PNNL), Richland, WA (United States)</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection><collection>OSTI.GOV</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Krishnan, Manojkumar</au><au>Lewis, Robert R</au><au>Vishnu, Abhinav</au><aucorp>Pacific Northwest National Lab. (PNNL), Richland, WA (United States)</aucorp><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Scaling Linear Algebra Kernels Using Remote Memory Access</atitle><btitle>2010 39th International Conference on Parallel Processing Workshops</btitle><stitle>icppw</stitle><date>2010-09</date><risdate>2010</risdate><spage>369</spage><epage>376</epage><pages>369-376</pages><issn>0190-3918</issn><eissn>2332-5690</eissn><isbn>1424479185</isbn><isbn>9781424479184</isbn><eisbn>0769541577</eisbn><eisbn>9780769541570</eisbn><abstract>This paper describes the scalability of linear algebra kernels based on remote memory access approach. The current approach differs from the other linear algebra algorithms by the explicit use of shared memory and remote memory access (RMA) communication rather than message passing. It is suitable for clusters and scalable shared memory systems. The experimental results on large scale systems (Linux-Infiniband cluster, Cray XT) demonstrate consistent performance advantages over ScaLAPACK suite, the leading implementation of parallel linear algebra algorithms used today. For example, on a Cray XT4 for a matrix size of 102400, our RMA-based matrix multiplication achieved over 55 teraflops while ScaLAPACK's pdgemm measured close to 42 teraflops on 10000 processes.</abstract><cop>United States</cop><pub>IEEE</pub><doi>10.1109/ICPPW.2010.57</doi><tpages>8</tpages></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 0190-3918
ispartof 2010 39th International Conference on Parallel Processing Workshops, 2010, p.369-376
issn 0190-3918
2332-5690
language eng
recordid cdi_ieee_primary_5599095
source IEEE Electronic Library (IEL) Conference Proceedings
subjects ALGEBRA
ALGORITHMS
armci
Clustering algorithms
COMMUNICATIONS
Data models
GENERAL AND MISCELLANEOUS//MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE
global arrays
IMPLEMENTATION
Kernel
KERNELS
Linear algebra
Message passing
one sided communication
parallel linear algebra
PARALLEL PROCESSING
Protocols
Remote memory access
remote memory access, one-sided communication, global arrays, ARMCI
Scalability
title Scaling Linear Algebra Kernels Using Remote Memory Access
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-14T11%3A23%3A00IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-osti_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Scaling%20Linear%20Algebra%20Kernels%20Using%20Remote%20Memory%20Access&rft.btitle=2010%2039th%20International%20Conference%20on%20Parallel%20Processing%20Workshops&rft.au=Krishnan,%20Manojkumar&rft.aucorp=Pacific%20Northwest%20National%20Lab.%20(PNNL),%20Richland,%20WA%20(United%20States)&rft.date=2010-09&rft.spage=369&rft.epage=376&rft.pages=369-376&rft.issn=0190-3918&rft.eissn=2332-5690&rft.isbn=1424479185&rft.isbn_list=9781424479184&rft_id=info:doi/10.1109/ICPPW.2010.57&rft_dat=%3Costi_6IE%3E994036%3C/osti_6IE%3E%3Curl%3E%3C/url%3E&rft.eisbn=0769541577&rft.eisbn_list=9780769541570&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=5599095&rfr_iscdi=true