Lessons learned at 208K: Towards debugging millions of cores

Petascale systems will present several new challenges to performance and correctness tools. Such machines may contain millions of cores, requiring that tools use scalable data structures and analysis algorithms to collect and to process application data. In addition, at such scales, each tool itself...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Lee, G.L., Ahn, D.H., Arnold, D.C., de Supinski, B.R., Legendre, M., Miller, B.P., Schulz, M., Liblit, B.
Format:	Tagungsbericht
Sprache:	eng
Schlagworte:	Algorithm design and analysis Application software Data analysis Data structures Debugging File systems Laboratories Large-scale systems Scalability System software
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	9
container_issue
container_start_page	1
container_title
container_volume
creator	Lee, G.L. Ahn, D.H. Arnold, D.C. de Supinski, B.R. Legendre, M. Miller, B.P. Schulz, M. Liblit, B.
description	Petascale systems will present several new challenges to performance and correctness tools. Such machines may contain millions of cores, requiring that tools use scalable data structures and analysis algorithms to collect and to process application data. In addition, at such scales, each tool itself will become a large parallel application - already, debugging the full Blue-Gene/L (BG/L) installation at the Lawrence Livermore National Laboratory requires employing 1664 tool daemons. To reach such sizes and beyond, tools must use a scalable communication infrastructure and manage their own tool processes efficiently. Some system resources, such as the file system, may also become tool bottlenecks. In this paper, we present challenges to petascale tool development, using the stack trace analysis tool (STAT) as a case study. STAT is a lightweight tool that gathers and merges stack traces from a parallel application to identify process equivalence classes. We use results gathered at thousands of tasks on an Infiniband cluster and results up to 208 K processes on BG/L to identify current scalability issues as well as challenges that will be faced at the petascale. We then present implemented solutions to these challenges and show the resulting performance improvements. We also discuss future plans to meet the debugging demands of petascale machines.
doi_str_mv	10.1109/SC.2008.5218557
format	Conference Proceeding
fullrecord	<record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_5218557</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>5218557</ieee_id><sourcerecordid>5218557</sourcerecordid><originalsourceid>FETCH-LOGICAL-i1317-da463ee6803ca95e2c6264125d9bf8f927c0513fb411549d97698ef66e9cdcbd3</originalsourceid><addsrcrecordid>eNo9kDtPwzAUhc2jEm3JzMDiP5Di67cRC4p4iUgMlLly7JvIKE1QDEL8e6gonOUM39E3HELOgK0AmLt4rlacMbtSHKxS5oAsQHIpuRUKDsmcgzalFMIckcIZ-8ckP_5n3M3IYudwzDBQJ6TI-ZX9RCohmJmTqxpzHodMe_TTgJH6d8qZfbyk6_HTTzHTiM1H16Who9vU92m3HVsaxgnzKZm1vs9Y7HtJXm5v1tV9WT_dPVTXdZlAgCmjl1ogastE8E4hD5prCVxF17S2ddwEpkC0jQRQ0kVntLPYao0uxNBEsSTnv96EiJu3KW399LXZvyK-AYoHTOM</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Lessons learned at 208K: Towards debugging millions of cores</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Lee, G.L. ; Ahn, D.H. ; Arnold, D.C. ; de Supinski, B.R. ; Legendre, M. ; Miller, B.P. ; Schulz, M. ; Liblit, B.</creator><creatorcontrib>Lee, G.L. ; Ahn, D.H. ; Arnold, D.C. ; de Supinski, B.R. ; Legendre, M. ; Miller, B.P. ; Schulz, M. ; Liblit, B.</creatorcontrib><description>Petascale systems will present several new challenges to performance and correctness tools. Such machines may contain millions of cores, requiring that tools use scalable data structures and analysis algorithms to collect and to process application data. In addition, at such scales, each tool itself will become a large parallel application - already, debugging the full Blue-Gene/L (BG/L) installation at the Lawrence Livermore National Laboratory requires employing 1664 tool daemons. To reach such sizes and beyond, tools must use a scalable communication infrastructure and manage their own tool processes efficiently. Some system resources, such as the file system, may also become tool bottlenecks. In this paper, we present challenges to petascale tool development, using the stack trace analysis tool (STAT) as a case study. STAT is a lightweight tool that gathers and merges stack traces from a parallel application to identify process equivalence classes. We use results gathered at thousands of tasks on an Infiniband cluster and results up to 208 K processes on BG/L to identify current scalability issues as well as challenges that will be faced at the petascale. We then present implemented solutions to these challenges and show the resulting performance improvements. We also discuss future plans to meet the debugging demands of petascale machines.</description><identifier>ISSN: 2167-4329</identifier><identifier>ISBN: 9781424428342</identifier><identifier>ISBN: 1424428343</identifier><identifier>EISSN: 2167-4337</identifier><identifier>EISBN: 1424428351</identifier><identifier>EISBN: 9781424428359</identifier><identifier>DOI: 10.1109/SC.2008.5218557</identifier><identifier>LCCN: 2008907015</identifier><language>eng</language><publisher>IEEE</publisher><subject>Algorithm design and analysis ; Application software ; Data analysis ; Data structures ; Debugging ; File systems ; Laboratories ; Large-scale systems ; Scalability ; System software</subject><ispartof>2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2008, p.1-9</ispartof><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/5218557$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,2058,27925,54920</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/5218557$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Lee, G.L.</creatorcontrib><creatorcontrib>Ahn, D.H.</creatorcontrib><creatorcontrib>Arnold, D.C.</creatorcontrib><creatorcontrib>de Supinski, B.R.</creatorcontrib><creatorcontrib>Legendre, M.</creatorcontrib><creatorcontrib>Miller, B.P.</creatorcontrib><creatorcontrib>Schulz, M.</creatorcontrib><creatorcontrib>Liblit, B.</creatorcontrib><title>Lessons learned at 208K: Towards debugging millions of cores</title><title>2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis</title><addtitle>SC</addtitle><description>Petascale systems will present several new challenges to performance and correctness tools. Such machines may contain millions of cores, requiring that tools use scalable data structures and analysis algorithms to collect and to process application data. In addition, at such scales, each tool itself will become a large parallel application - already, debugging the full Blue-Gene/L (BG/L) installation at the Lawrence Livermore National Laboratory requires employing 1664 tool daemons. To reach such sizes and beyond, tools must use a scalable communication infrastructure and manage their own tool processes efficiently. Some system resources, such as the file system, may also become tool bottlenecks. In this paper, we present challenges to petascale tool development, using the stack trace analysis tool (STAT) as a case study. STAT is a lightweight tool that gathers and merges stack traces from a parallel application to identify process equivalence classes. We use results gathered at thousands of tasks on an Infiniband cluster and results up to 208 K processes on BG/L to identify current scalability issues as well as challenges that will be faced at the petascale. We then present implemented solutions to these challenges and show the resulting performance improvements. We also discuss future plans to meet the debugging demands of petascale machines.</description><subject>Algorithm design and analysis</subject><subject>Application software</subject><subject>Data analysis</subject><subject>Data structures</subject><subject>Debugging</subject><subject>File systems</subject><subject>Laboratories</subject><subject>Large-scale systems</subject><subject>Scalability</subject><subject>System software</subject><issn>2167-4329</issn><issn>2167-4337</issn><isbn>9781424428342</isbn><isbn>1424428343</isbn><isbn>1424428351</isbn><isbn>9781424428359</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2008</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNo9kDtPwzAUhc2jEm3JzMDiP5Di67cRC4p4iUgMlLly7JvIKE1QDEL8e6gonOUM39E3HELOgK0AmLt4rlacMbtSHKxS5oAsQHIpuRUKDsmcgzalFMIckcIZ-8ckP_5n3M3IYudwzDBQJ6TI-ZX9RCohmJmTqxpzHodMe_TTgJH6d8qZfbyk6_HTTzHTiM1H16Who9vU92m3HVsaxgnzKZm1vs9Y7HtJXm5v1tV9WT_dPVTXdZlAgCmjl1ogastE8E4hD5prCVxF17S2ddwEpkC0jQRQ0kVntLPYao0uxNBEsSTnv96EiJu3KW399LXZvyK-AYoHTOM</recordid><startdate>200811</startdate><enddate>200811</enddate><creator>Lee, G.L.</creator><creator>Ahn, D.H.</creator><creator>Arnold, D.C.</creator><creator>de Supinski, B.R.</creator><creator>Legendre, M.</creator><creator>Miller, B.P.</creator><creator>Schulz, M.</creator><creator>Liblit, B.</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>200811</creationdate><title>Lessons learned at 208K: Towards debugging millions of cores</title><author>Lee, G.L. ; Ahn, D.H. ; Arnold, D.C. ; de Supinski, B.R. ; Legendre, M. ; Miller, B.P. ; Schulz, M. ; Liblit, B.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i1317-da463ee6803ca95e2c6264125d9bf8f927c0513fb411549d97698ef66e9cdcbd3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2008</creationdate><topic>Algorithm design and analysis</topic><topic>Application software</topic><topic>Data analysis</topic><topic>Data structures</topic><topic>Debugging</topic><topic>File systems</topic><topic>Laboratories</topic><topic>Large-scale systems</topic><topic>Scalability</topic><topic>System software</topic><toplevel>online_resources</toplevel><creatorcontrib>Lee, G.L.</creatorcontrib><creatorcontrib>Ahn, D.H.</creatorcontrib><creatorcontrib>Arnold, D.C.</creatorcontrib><creatorcontrib>de Supinski, B.R.</creatorcontrib><creatorcontrib>Legendre, M.</creatorcontrib><creatorcontrib>Miller, B.P.</creatorcontrib><creatorcontrib>Schulz, M.</creatorcontrib><creatorcontrib>Liblit, B.</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Lee, G.L.</au><au>Ahn, D.H.</au><au>Arnold, D.C.</au><au>de Supinski, B.R.</au><au>Legendre, M.</au><au>Miller, B.P.</au><au>Schulz, M.</au><au>Liblit, B.</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Lessons learned at 208K: Towards debugging millions of cores</atitle><btitle>2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis</btitle><stitle>SC</stitle><date>2008-11</date><risdate>2008</risdate><spage>1</spage><epage>9</epage><pages>1-9</pages><issn>2167-4329</issn><eissn>2167-4337</eissn><isbn>9781424428342</isbn><isbn>1424428343</isbn><eisbn>1424428351</eisbn><eisbn>9781424428359</eisbn><abstract>Petascale systems will present several new challenges to performance and correctness tools. Such machines may contain millions of cores, requiring that tools use scalable data structures and analysis algorithms to collect and to process application data. In addition, at such scales, each tool itself will become a large parallel application - already, debugging the full Blue-Gene/L (BG/L) installation at the Lawrence Livermore National Laboratory requires employing 1664 tool daemons. To reach such sizes and beyond, tools must use a scalable communication infrastructure and manage their own tool processes efficiently. Some system resources, such as the file system, may also become tool bottlenecks. In this paper, we present challenges to petascale tool development, using the stack trace analysis tool (STAT) as a case study. STAT is a lightweight tool that gathers and merges stack traces from a parallel application to identify process equivalence classes. We use results gathered at thousands of tasks on an Infiniband cluster and results up to 208 K processes on BG/L to identify current scalability issues as well as challenges that will be faced at the petascale. We then present implemented solutions to these challenges and show the resulting performance improvements. We also discuss future plans to meet the debugging demands of petascale machines.</abstract><pub>IEEE</pub><doi>10.1109/SC.2008.5218557</doi><tpages>9</tpages><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 2167-4329
ispartof	2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2008, p.1-9
issn	2167-4329 2167-4337
language	eng
recordid	cdi_ieee_primary_5218557
source	IEEE Electronic Library (IEL) Conference Proceedings
subjects	Algorithm design and analysis Application software Data analysis Data structures Debugging File systems Laboratories Large-scale systems Scalability System software
title	Lessons learned at 208K: Towards debugging millions of cores
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-05T04%3A37%3A16IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Lessons%20learned%20at%20208K:%20Towards%20debugging%20millions%20of%20cores&rft.btitle=2008%20SC%20-%20International%20Conference%20for%20High%20Performance%20Computing,%20Networking,%20Storage%20and%20Analysis&rft.au=Lee,%20G.L.&rft.date=2008-11&rft.spage=1&rft.epage=9&rft.pages=1-9&rft.issn=2167-4329&rft.eissn=2167-4337&rft.isbn=9781424428342&rft.isbn_list=1424428343&rft_id=info:doi/10.1109/SC.2008.5218557&rft_dat=%3Cieee_6IE%3E5218557%3C/ieee_6IE%3E%3Curl%3E%3C/url%3E&rft.eisbn=1424428351&rft.eisbn_list=9781424428359&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=5218557&rfr_iscdi=true