Scaling Linear Algebra Kernels Using Remote Memory Access
This paper describes the scalability of linear algebra kernels based on remote memory access approach. The current approach differs from the other linear algebra algorithms by the explicit use of shared memory and remote memory access (RMA) communication rather than message passing. It is suitable f...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Tagungsbericht |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 376 |
---|---|
container_issue | |
container_start_page | 369 |
container_title | |
container_volume | |
creator | Krishnan, Manojkumar Lewis, Robert R Vishnu, Abhinav |
description | This paper describes the scalability of linear algebra kernels based on remote memory access approach. The current approach differs from the other linear algebra algorithms by the explicit use of shared memory and remote memory access (RMA) communication rather than message passing. It is suitable for clusters and scalable shared memory systems. The experimental results on large scale systems (Linux-Infiniband cluster, Cray XT) demonstrate consistent performance advantages over ScaLAPACK suite, the leading implementation of parallel linear algebra algorithms used today. For example, on a Cray XT4 for a matrix size of 102400, our RMA-based matrix multiplication achieved over 55 teraflops while ScaLAPACK's pdgemm measured close to 42 teraflops on 10000 processes. |
doi_str_mv | 10.1109/ICPPW.2010.57 |
format | Conference Proceeding |
fullrecord | <record><control><sourceid>osti_6IE</sourceid><recordid>TN_cdi_ieee_primary_5599095</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>5599095</ieee_id><sourcerecordid>994036</sourcerecordid><originalsourceid>FETCH-LOGICAL-i116t-5e24abef3c400e65e37b165c586971e234953c74905ca56f872e24ce0ff64e9d3</originalsourceid><addsrcrecordid>eNotjEtPAjEUhesrEZClKzfjDxi8fdyWuyTEBxEjUYnLSal3sGYczHQ2_Htr8Gy-nJyHEJcSJlIC3Szmq9X7REH26I7EEJwlNBKdOxYDpbUq0RKciKE0yhhHcoqnYgCSoNTZnItxSl-QZVBJZwaCXoNvYrstlrFl3xWzZsubzheP3LXcpGKd_sIX_t71XDxldPtiFgKndCHOat8kHv9zJNZ3t2_zh3L5fL-Yz5ZllNL2JbIyfsO1DgaALbJ2G2kx4NSSk6y0IdTBGQIMHm09dSovAkNdW8P0oUfi-vC7S32sUog9h8-wa1sOfUVkQNvcuTp0IjNXP1389t2-QiSC_P4LqyVUlQ</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Scaling Linear Algebra Kernels Using Remote Memory Access</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Krishnan, Manojkumar ; Lewis, Robert R ; Vishnu, Abhinav</creator><creatorcontrib>Krishnan, Manojkumar ; Lewis, Robert R ; Vishnu, Abhinav ; Pacific Northwest National Lab. (PNNL), Richland, WA (United States)</creatorcontrib><description>This paper describes the scalability of linear algebra kernels based on remote memory access approach. The current approach differs from the other linear algebra algorithms by the explicit use of shared memory and remote memory access (RMA) communication rather than message passing. It is suitable for clusters and scalable shared memory systems. The experimental results on large scale systems (Linux-Infiniband cluster, Cray XT) demonstrate consistent performance advantages over ScaLAPACK suite, the leading implementation of parallel linear algebra algorithms used today. For example, on a Cray XT4 for a matrix size of 102400, our RMA-based matrix multiplication achieved over 55 teraflops while ScaLAPACK's pdgemm measured close to 42 teraflops on 10000 processes.</description><identifier>ISSN: 0190-3918</identifier><identifier>ISBN: 1424479185</identifier><identifier>ISBN: 9781424479184</identifier><identifier>EISSN: 2332-5690</identifier><identifier>EISBN: 0769541577</identifier><identifier>EISBN: 9780769541570</identifier><identifier>DOI: 10.1109/ICPPW.2010.57</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>ALGEBRA ; ALGORITHMS ; armci ; Clustering algorithms ; COMMUNICATIONS ; Data models ; GENERAL AND MISCELLANEOUS//MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE ; global arrays ; IMPLEMENTATION ; Kernel ; KERNELS ; Linear algebra ; Message passing ; one sided communication ; parallel linear algebra ; PARALLEL PROCESSING ; Protocols ; Remote memory access ; remote memory access, one-sided communication, global arrays, ARMCI ; Scalability</subject><ispartof>2010 39th International Conference on Parallel Processing Workshops, 2010, p.369-376</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/5599095$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,885,2056,27923,54918</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/5599095$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttps://www.osti.gov/biblio/994036$$D View this record in Osti.gov$$Hfree_for_read</backlink></links><search><creatorcontrib>Krishnan, Manojkumar</creatorcontrib><creatorcontrib>Lewis, Robert R</creatorcontrib><creatorcontrib>Vishnu, Abhinav</creatorcontrib><creatorcontrib>Pacific Northwest National Lab. (PNNL), Richland, WA (United States)</creatorcontrib><title>Scaling Linear Algebra Kernels Using Remote Memory Access</title><title>2010 39th International Conference on Parallel Processing Workshops</title><addtitle>icppw</addtitle><description>This paper describes the scalability of linear algebra kernels based on remote memory access approach. The current approach differs from the other linear algebra algorithms by the explicit use of shared memory and remote memory access (RMA) communication rather than message passing. It is suitable for clusters and scalable shared memory systems. The experimental results on large scale systems (Linux-Infiniband cluster, Cray XT) demonstrate consistent performance advantages over ScaLAPACK suite, the leading implementation of parallel linear algebra algorithms used today. For example, on a Cray XT4 for a matrix size of 102400, our RMA-based matrix multiplication achieved over 55 teraflops while ScaLAPACK's pdgemm measured close to 42 teraflops on 10000 processes.</description><subject>ALGEBRA</subject><subject>ALGORITHMS</subject><subject>armci</subject><subject>Clustering algorithms</subject><subject>COMMUNICATIONS</subject><subject>Data models</subject><subject>GENERAL AND MISCELLANEOUS//MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE</subject><subject>global arrays</subject><subject>IMPLEMENTATION</subject><subject>Kernel</subject><subject>KERNELS</subject><subject>Linear algebra</subject><subject>Message passing</subject><subject>one sided communication</subject><subject>parallel linear algebra</subject><subject>PARALLEL PROCESSING</subject><subject>Protocols</subject><subject>Remote memory access</subject><subject>remote memory access, one-sided communication, global arrays, ARMCI</subject><subject>Scalability</subject><issn>0190-3918</issn><issn>2332-5690</issn><isbn>1424479185</isbn><isbn>9781424479184</isbn><isbn>0769541577</isbn><isbn>9780769541570</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2010</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNotjEtPAjEUhesrEZClKzfjDxi8fdyWuyTEBxEjUYnLSal3sGYczHQ2_Htr8Gy-nJyHEJcSJlIC3Szmq9X7REH26I7EEJwlNBKdOxYDpbUq0RKciKE0yhhHcoqnYgCSoNTZnItxSl-QZVBJZwaCXoNvYrstlrFl3xWzZsubzheP3LXcpGKd_sIX_t71XDxldPtiFgKndCHOat8kHv9zJNZ3t2_zh3L5fL-Yz5ZllNL2JbIyfsO1DgaALbJ2G2kx4NSSk6y0IdTBGQIMHm09dSovAkNdW8P0oUfi-vC7S32sUog9h8-wa1sOfUVkQNvcuTp0IjNXP1389t2-QiSC_P4LqyVUlQ</recordid><startdate>201009</startdate><enddate>201009</enddate><creator>Krishnan, Manojkumar</creator><creator>Lewis, Robert R</creator><creator>Vishnu, Abhinav</creator><general>IEEE</general><general>IEEE Computer Society, Los Alamitos, CA, United States(US)</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope><scope>OTOTI</scope></search><sort><creationdate>201009</creationdate><title>Scaling Linear Algebra Kernels Using Remote Memory Access</title><author>Krishnan, Manojkumar ; Lewis, Robert R ; Vishnu, Abhinav</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i116t-5e24abef3c400e65e37b165c586971e234953c74905ca56f872e24ce0ff64e9d3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2010</creationdate><topic>ALGEBRA</topic><topic>ALGORITHMS</topic><topic>armci</topic><topic>Clustering algorithms</topic><topic>COMMUNICATIONS</topic><topic>Data models</topic><topic>GENERAL AND MISCELLANEOUS//MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE</topic><topic>global arrays</topic><topic>IMPLEMENTATION</topic><topic>Kernel</topic><topic>KERNELS</topic><topic>Linear algebra</topic><topic>Message passing</topic><topic>one sided communication</topic><topic>parallel linear algebra</topic><topic>PARALLEL PROCESSING</topic><topic>Protocols</topic><topic>Remote memory access</topic><topic>remote memory access, one-sided communication, global arrays, ARMCI</topic><topic>Scalability</topic><toplevel>online_resources</toplevel><creatorcontrib>Krishnan, Manojkumar</creatorcontrib><creatorcontrib>Lewis, Robert R</creatorcontrib><creatorcontrib>Vishnu, Abhinav</creatorcontrib><creatorcontrib>Pacific Northwest National Lab. (PNNL), Richland, WA (United States)</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection><collection>OSTI.GOV</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Krishnan, Manojkumar</au><au>Lewis, Robert R</au><au>Vishnu, Abhinav</au><aucorp>Pacific Northwest National Lab. (PNNL), Richland, WA (United States)</aucorp><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Scaling Linear Algebra Kernels Using Remote Memory Access</atitle><btitle>2010 39th International Conference on Parallel Processing Workshops</btitle><stitle>icppw</stitle><date>2010-09</date><risdate>2010</risdate><spage>369</spage><epage>376</epage><pages>369-376</pages><issn>0190-3918</issn><eissn>2332-5690</eissn><isbn>1424479185</isbn><isbn>9781424479184</isbn><eisbn>0769541577</eisbn><eisbn>9780769541570</eisbn><abstract>This paper describes the scalability of linear algebra kernels based on remote memory access approach. The current approach differs from the other linear algebra algorithms by the explicit use of shared memory and remote memory access (RMA) communication rather than message passing. It is suitable for clusters and scalable shared memory systems. The experimental results on large scale systems (Linux-Infiniband cluster, Cray XT) demonstrate consistent performance advantages over ScaLAPACK suite, the leading implementation of parallel linear algebra algorithms used today. For example, on a Cray XT4 for a matrix size of 102400, our RMA-based matrix multiplication achieved over 55 teraflops while ScaLAPACK's pdgemm measured close to 42 teraflops on 10000 processes.</abstract><cop>United States</cop><pub>IEEE</pub><doi>10.1109/ICPPW.2010.57</doi><tpages>8</tpages></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 0190-3918 |
ispartof | 2010 39th International Conference on Parallel Processing Workshops, 2010, p.369-376 |
issn | 0190-3918 2332-5690 |
language | eng |
recordid | cdi_ieee_primary_5599095 |
source | IEEE Electronic Library (IEL) Conference Proceedings |
subjects | ALGEBRA ALGORITHMS armci Clustering algorithms COMMUNICATIONS Data models GENERAL AND MISCELLANEOUS//MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE global arrays IMPLEMENTATION Kernel KERNELS Linear algebra Message passing one sided communication parallel linear algebra PARALLEL PROCESSING Protocols Remote memory access remote memory access, one-sided communication, global arrays, ARMCI Scalability |
title | Scaling Linear Algebra Kernels Using Remote Memory Access |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-14T11%3A23%3A00IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-osti_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Scaling%20Linear%20Algebra%20Kernels%20Using%20Remote%20Memory%20Access&rft.btitle=2010%2039th%20International%20Conference%20on%20Parallel%20Processing%20Workshops&rft.au=Krishnan,%20Manojkumar&rft.aucorp=Pacific%20Northwest%20National%20Lab.%20(PNNL),%20Richland,%20WA%20(United%20States)&rft.date=2010-09&rft.spage=369&rft.epage=376&rft.pages=369-376&rft.issn=0190-3918&rft.eissn=2332-5690&rft.isbn=1424479185&rft.isbn_list=9781424479184&rft_id=info:doi/10.1109/ICPPW.2010.57&rft_dat=%3Costi_6IE%3E994036%3C/osti_6IE%3E%3Curl%3E%3C/url%3E&rft.eisbn=0769541577&rft.eisbn_list=9780769541570&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=5599095&rfr_iscdi=true |