Automated service monitoring in the deployment of ARCHER2

The ARCHER2 service, a CPU based HPE Cray EX system with 750,080 cores (5,860 nodes), has been deployed throughout 2020 and 2021, going into full service in December of 2021. A key part of the work during this deployment was the integration of ARCHER2 into our local monitoring systems. As ARCHER2 wa...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:arXiv.org 2023-03
Hauptverfasser: Leach, Kieran, Cass, Philip, Robson, Steven, Kazakevicius, Eimantas, Lafferty, Martin, Turner, Andrew, Simpson, Alan
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Leach, Kieran
Cass, Philip
Robson, Steven
Kazakevicius, Eimantas
Lafferty, Martin
Turner, Andrew
Simpson, Alan
description The ARCHER2 service, a CPU based HPE Cray EX system with 750,080 cores (5,860 nodes), has been deployed throughout 2020 and 2021, going into full service in December of 2021. A key part of the work during this deployment was the integration of ARCHER2 into our local monitoring systems. As ARCHER2 was one of the very first large-scale EX deployments, this involved close collaboration and development work with the HPE team through a global pandemic situation where collaboration and co-working was significantly more challenging than usual. The deployment included the creation of automated checks and visual representations of system status which needed to be made available to external parties for diagnosis and interpretation. We will describe how these checks have been deployed and how data gathered played a key role in the deployment of ARCHER2, the commissioning of the plant infrastructure, the conduct of HPL runs for submission to the Top500 and contractual monitoring of the availability of the ARCHER2 service during its commissioning and early life.
format Article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2789554201</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2789554201</sourcerecordid><originalsourceid>FETCH-proquest_journals_27895542013</originalsourceid><addsrcrecordid>eNqNykELgjAYgOERBEn5Hz7oLMxvLvUoYniW7iL5aRPdbJtB_74O_YBO7-F5dyxAIeIoSxAPLHRu4pzjJUUpRcDyYvNm6Tz14Mi-1J1gMVp5Y5UeQWnwD4Ke1tm8F9IezABFU9ZVgye2H7rZUfjrkZ2v1a2so9Wa50bOt5PZrP5Si2mWS5kgj8V_1wdcaDXk</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2789554201</pqid></control><display><type>article</type><title>Automated service monitoring in the deployment of ARCHER2</title><source>Freely Accessible Journals</source><creator>Leach, Kieran ; Cass, Philip ; Robson, Steven ; Kazakevicius, Eimantas ; Lafferty, Martin ; Turner, Andrew ; Simpson, Alan</creator><creatorcontrib>Leach, Kieran ; Cass, Philip ; Robson, Steven ; Kazakevicius, Eimantas ; Lafferty, Martin ; Turner, Andrew ; Simpson, Alan</creatorcontrib><description>The ARCHER2 service, a CPU based HPE Cray EX system with 750,080 cores (5,860 nodes), has been deployed throughout 2020 and 2021, going into full service in December of 2021. A key part of the work during this deployment was the integration of ARCHER2 into our local monitoring systems. As ARCHER2 was one of the very first large-scale EX deployments, this involved close collaboration and development work with the HPE team through a global pandemic situation where collaboration and co-working was significantly more challenging than usual. The deployment included the creation of automated checks and visual representations of system status which needed to be made available to external parties for diagnosis and interpretation. We will describe how these checks have been deployed and how data gathered played a key role in the deployment of ARCHER2, the commissioning of the plant infrastructure, the conduct of HPL runs for submission to the Top500 and contractual monitoring of the availability of the ARCHER2 service during its commissioning and early life.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Automation ; Availability ; Collaboration ; Commissioning ; Cooperation ; Monitoring</subject><ispartof>arXiv.org, 2023-03</ispartof><rights>2023. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>776,780</link.rule.ids></links><search><creatorcontrib>Leach, Kieran</creatorcontrib><creatorcontrib>Cass, Philip</creatorcontrib><creatorcontrib>Robson, Steven</creatorcontrib><creatorcontrib>Kazakevicius, Eimantas</creatorcontrib><creatorcontrib>Lafferty, Martin</creatorcontrib><creatorcontrib>Turner, Andrew</creatorcontrib><creatorcontrib>Simpson, Alan</creatorcontrib><title>Automated service monitoring in the deployment of ARCHER2</title><title>arXiv.org</title><description>The ARCHER2 service, a CPU based HPE Cray EX system with 750,080 cores (5,860 nodes), has been deployed throughout 2020 and 2021, going into full service in December of 2021. A key part of the work during this deployment was the integration of ARCHER2 into our local monitoring systems. As ARCHER2 was one of the very first large-scale EX deployments, this involved close collaboration and development work with the HPE team through a global pandemic situation where collaboration and co-working was significantly more challenging than usual. The deployment included the creation of automated checks and visual representations of system status which needed to be made available to external parties for diagnosis and interpretation. We will describe how these checks have been deployed and how data gathered played a key role in the deployment of ARCHER2, the commissioning of the plant infrastructure, the conduct of HPL runs for submission to the Top500 and contractual monitoring of the availability of the ARCHER2 service during its commissioning and early life.</description><subject>Automation</subject><subject>Availability</subject><subject>Collaboration</subject><subject>Commissioning</subject><subject>Cooperation</subject><subject>Monitoring</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNqNykELgjAYgOERBEn5Hz7oLMxvLvUoYniW7iL5aRPdbJtB_74O_YBO7-F5dyxAIeIoSxAPLHRu4pzjJUUpRcDyYvNm6Tz14Mi-1J1gMVp5Y5UeQWnwD4Ke1tm8F9IezABFU9ZVgye2H7rZUfjrkZ2v1a2so9Wa50bOt5PZrP5Si2mWS5kgj8V_1wdcaDXk</recordid><startdate>20230321</startdate><enddate>20230321</enddate><creator>Leach, Kieran</creator><creator>Cass, Philip</creator><creator>Robson, Steven</creator><creator>Kazakevicius, Eimantas</creator><creator>Lafferty, Martin</creator><creator>Turner, Andrew</creator><creator>Simpson, Alan</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20230321</creationdate><title>Automated service monitoring in the deployment of ARCHER2</title><author>Leach, Kieran ; Cass, Philip ; Robson, Steven ; Kazakevicius, Eimantas ; Lafferty, Martin ; Turner, Andrew ; Simpson, Alan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_27895542013</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Automation</topic><topic>Availability</topic><topic>Collaboration</topic><topic>Commissioning</topic><topic>Cooperation</topic><topic>Monitoring</topic><toplevel>online_resources</toplevel><creatorcontrib>Leach, Kieran</creatorcontrib><creatorcontrib>Cass, Philip</creatorcontrib><creatorcontrib>Robson, Steven</creatorcontrib><creatorcontrib>Kazakevicius, Eimantas</creatorcontrib><creatorcontrib>Lafferty, Martin</creatorcontrib><creatorcontrib>Turner, Andrew</creatorcontrib><creatorcontrib>Simpson, Alan</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Leach, Kieran</au><au>Cass, Philip</au><au>Robson, Steven</au><au>Kazakevicius, Eimantas</au><au>Lafferty, Martin</au><au>Turner, Andrew</au><au>Simpson, Alan</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Automated service monitoring in the deployment of ARCHER2</atitle><jtitle>arXiv.org</jtitle><date>2023-03-21</date><risdate>2023</risdate><eissn>2331-8422</eissn><abstract>The ARCHER2 service, a CPU based HPE Cray EX system with 750,080 cores (5,860 nodes), has been deployed throughout 2020 and 2021, going into full service in December of 2021. A key part of the work during this deployment was the integration of ARCHER2 into our local monitoring systems. As ARCHER2 was one of the very first large-scale EX deployments, this involved close collaboration and development work with the HPE team through a global pandemic situation where collaboration and co-working was significantly more challenging than usual. The deployment included the creation of automated checks and visual representations of system status which needed to be made available to external parties for diagnosis and interpretation. We will describe how these checks have been deployed and how data gathered played a key role in the deployment of ARCHER2, the commissioning of the plant infrastructure, the conduct of HPL runs for submission to the Top500 and contractual monitoring of the availability of the ARCHER2 service during its commissioning and early life.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2023-03
issn 2331-8422
language eng
recordid cdi_proquest_journals_2789554201
source Freely Accessible Journals
subjects Automation
Availability
Collaboration
Commissioning
Cooperation
Monitoring
title Automated service monitoring in the deployment of ARCHER2
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-25T07%3A58%3A51IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Automated%20service%20monitoring%20in%20the%20deployment%20of%20ARCHER2&rft.jtitle=arXiv.org&rft.au=Leach,%20Kieran&rft.date=2023-03-21&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2789554201%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2789554201&rft_id=info:pmid/&rfr_iscdi=true