AI4IO: A suite of AI-based tools for IO-aware scheduling

Traditional workload managers do not have the capacity to consider how IO contention can increase job runtime and even cause entire resource allocations to be wasted. Whether from bursts of IO demand or parallel file systems (PFS) performance degradation, IO contention must be identified and address...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:The international journal of high performance computing applications 2022-05, Vol.36 (3), p.370-387
Hauptverfasser: Wyatt, Michael R, Herbein, Stephen, Gamblin, Todd, Taufer, Michela
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 387
container_issue 3
container_start_page 370
container_title The international journal of high performance computing applications
container_volume 36
creator Wyatt, Michael R
Herbein, Stephen
Gamblin, Todd
Taufer, Michela
description Traditional workload managers do not have the capacity to consider how IO contention can increase job runtime and even cause entire resource allocations to be wasted. Whether from bursts of IO demand or parallel file systems (PFS) performance degradation, IO contention must be identified and addressed to ensure maximum performance. In this paper, we present AI4IO (AI for IO), a suite of tools using AI methods to prevent and mitigate performance losses due to IO contention. AI4IO enables existing workload managers to become IO-aware. Currently, AI4IO consists of two tools: PRIONN and CanarIO. PRIONN predicts IO contention and empowers schedulers to prevent it. CanarIO mitigates the impact of IO contention when it does occur. We measure the effectiveness of AI4IO when integrated into Flux, a next-generation scheduler, for both small- and large-scale IO-intensive job workloads. Our results show that integrating AI4IO into Flux improves the workload makespan up to 6.4%, which can account for more than 18,000 node-h of saved resources per week on a production cluster in our large-scale workload.
doi_str_mv 10.1177/10943420221079765
format Article
fullrecord <record><control><sourceid>proquest_osti_</sourceid><recordid>TN_cdi_osti_scitechconnect_1860988</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sage_id>10.1177_10943420221079765</sage_id><sourcerecordid>2665044523</sourcerecordid><originalsourceid>FETCH-LOGICAL-c291t-f163199071563d00ed68fa73dfa194b473f38455d971fd09171bd786ec7c99563</originalsourceid><addsrcrecordid>eNp1kMtKAzEUhoMoWKsP4C7oemrOJJOLu0G8DBS60XVIc2mn1EmdzCC-vSkjuBBX58D5vp_Dj9A1kAWAEHdAFKOsJGUJRCjBqxM0A8GgKCXjp3nP9-IInKOLlHaEEM5oNUOyblizusc1TmM7eBwDrptibZJ3eIhxn3CIPW5Whfk0vcfJbr0b9223uURnweyTv_qZc_T29Pj68FIsV8_NQ70sbKlgKAJwCkoRARWnjhDvuAxGUBcMKLZmggYqWVU5JSA4okDA2gnJvRVWqezM0c2UG9PQ6mTzk3ZrY9d5O2iQnCgpM3Q7QYc-fow-DXoXx77Lf-mS84owVpU0UzBRto8p9T7oQ9--m_5LA9HHFvWfFrOzmJxkNv439X_hGyivbL4</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2665044523</pqid></control><display><type>article</type><title>AI4IO: A suite of AI-based tools for IO-aware scheduling</title><source>SAGE Complete A-Z List</source><source>Alma/SFX Local Collection</source><creator>Wyatt, Michael R ; Herbein, Stephen ; Gamblin, Todd ; Taufer, Michela</creator><creatorcontrib>Wyatt, Michael R ; Herbein, Stephen ; Gamblin, Todd ; Taufer, Michela</creatorcontrib><description>Traditional workload managers do not have the capacity to consider how IO contention can increase job runtime and even cause entire resource allocations to be wasted. Whether from bursts of IO demand or parallel file systems (PFS) performance degradation, IO contention must be identified and addressed to ensure maximum performance. In this paper, we present AI4IO (AI for IO), a suite of tools using AI methods to prevent and mitigate performance losses due to IO contention. AI4IO enables existing workload managers to become IO-aware. Currently, AI4IO consists of two tools: PRIONN and CanarIO. PRIONN predicts IO contention and empowers schedulers to prevent it. CanarIO mitigates the impact of IO contention when it does occur. We measure the effectiveness of AI4IO when integrated into Flux, a next-generation scheduler, for both small- and large-scale IO-intensive job workloads. Our results show that integrating AI4IO into Flux improves the workload makespan up to 6.4%, which can account for more than 18,000 node-h of saved resources per week on a production cluster in our large-scale workload.</description><identifier>ISSN: 1094-3420</identifier><identifier>EISSN: 1741-2846</identifier><identifier>DOI: 10.1177/10943420221079765</identifier><language>eng</language><publisher>London, England: SAGE Publications</publisher><subject>Managers ; Performance degradation ; Resource allocation ; Workload ; Workloads</subject><ispartof>The international journal of high performance computing applications, 2022-05, Vol.36 (3), p.370-387</ispartof><rights>The Author(s) 2022</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c291t-f163199071563d00ed68fa73dfa194b473f38455d971fd09171bd786ec7c99563</cites><orcidid>0000-0002-0031-6377 ; 0000-0003-0141-0653 ; 0000000301410653 ; 0000000200316377</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://journals.sagepub.com/doi/pdf/10.1177/10943420221079765$$EPDF$$P50$$Gsage$$H</linktopdf><linktohtml>$$Uhttps://journals.sagepub.com/doi/10.1177/10943420221079765$$EHTML$$P50$$Gsage$$H</linktohtml><link.rule.ids>230,314,776,780,881,21799,27903,27904,43600,43601</link.rule.ids><backlink>$$Uhttps://www.osti.gov/biblio/1860988$$D View this record in Osti.gov$$Hfree_for_read</backlink></links><search><creatorcontrib>Wyatt, Michael R</creatorcontrib><creatorcontrib>Herbein, Stephen</creatorcontrib><creatorcontrib>Gamblin, Todd</creatorcontrib><creatorcontrib>Taufer, Michela</creatorcontrib><title>AI4IO: A suite of AI-based tools for IO-aware scheduling</title><title>The international journal of high performance computing applications</title><description>Traditional workload managers do not have the capacity to consider how IO contention can increase job runtime and even cause entire resource allocations to be wasted. Whether from bursts of IO demand or parallel file systems (PFS) performance degradation, IO contention must be identified and addressed to ensure maximum performance. In this paper, we present AI4IO (AI for IO), a suite of tools using AI methods to prevent and mitigate performance losses due to IO contention. AI4IO enables existing workload managers to become IO-aware. Currently, AI4IO consists of two tools: PRIONN and CanarIO. PRIONN predicts IO contention and empowers schedulers to prevent it. CanarIO mitigates the impact of IO contention when it does occur. We measure the effectiveness of AI4IO when integrated into Flux, a next-generation scheduler, for both small- and large-scale IO-intensive job workloads. Our results show that integrating AI4IO into Flux improves the workload makespan up to 6.4%, which can account for more than 18,000 node-h of saved resources per week on a production cluster in our large-scale workload.</description><subject>Managers</subject><subject>Performance degradation</subject><subject>Resource allocation</subject><subject>Workload</subject><subject>Workloads</subject><issn>1094-3420</issn><issn>1741-2846</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><recordid>eNp1kMtKAzEUhoMoWKsP4C7oemrOJJOLu0G8DBS60XVIc2mn1EmdzCC-vSkjuBBX58D5vp_Dj9A1kAWAEHdAFKOsJGUJRCjBqxM0A8GgKCXjp3nP9-IInKOLlHaEEM5oNUOyblizusc1TmM7eBwDrptibZJ3eIhxn3CIPW5Whfk0vcfJbr0b9223uURnweyTv_qZc_T29Pj68FIsV8_NQ70sbKlgKAJwCkoRARWnjhDvuAxGUBcMKLZmggYqWVU5JSA4okDA2gnJvRVWqezM0c2UG9PQ6mTzk3ZrY9d5O2iQnCgpM3Q7QYc-fow-DXoXx77Lf-mS84owVpU0UzBRto8p9T7oQ9--m_5LA9HHFvWfFrOzmJxkNv439X_hGyivbL4</recordid><startdate>20220501</startdate><enddate>20220501</enddate><creator>Wyatt, Michael R</creator><creator>Herbein, Stephen</creator><creator>Gamblin, Todd</creator><creator>Taufer, Michela</creator><general>SAGE Publications</general><general>SAGE PUBLICATIONS, INC</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>OTOTI</scope><orcidid>https://orcid.org/0000-0002-0031-6377</orcidid><orcidid>https://orcid.org/0000-0003-0141-0653</orcidid><orcidid>https://orcid.org/0000000301410653</orcidid><orcidid>https://orcid.org/0000000200316377</orcidid></search><sort><creationdate>20220501</creationdate><title>AI4IO: A suite of AI-based tools for IO-aware scheduling</title><author>Wyatt, Michael R ; Herbein, Stephen ; Gamblin, Todd ; Taufer, Michela</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c291t-f163199071563d00ed68fa73dfa194b473f38455d971fd09171bd786ec7c99563</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Managers</topic><topic>Performance degradation</topic><topic>Resource allocation</topic><topic>Workload</topic><topic>Workloads</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Wyatt, Michael R</creatorcontrib><creatorcontrib>Herbein, Stephen</creatorcontrib><creatorcontrib>Gamblin, Todd</creatorcontrib><creatorcontrib>Taufer, Michela</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>OSTI.GOV</collection><jtitle>The international journal of high performance computing applications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Wyatt, Michael R</au><au>Herbein, Stephen</au><au>Gamblin, Todd</au><au>Taufer, Michela</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>AI4IO: A suite of AI-based tools for IO-aware scheduling</atitle><jtitle>The international journal of high performance computing applications</jtitle><date>2022-05-01</date><risdate>2022</risdate><volume>36</volume><issue>3</issue><spage>370</spage><epage>387</epage><pages>370-387</pages><issn>1094-3420</issn><eissn>1741-2846</eissn><abstract>Traditional workload managers do not have the capacity to consider how IO contention can increase job runtime and even cause entire resource allocations to be wasted. Whether from bursts of IO demand or parallel file systems (PFS) performance degradation, IO contention must be identified and addressed to ensure maximum performance. In this paper, we present AI4IO (AI for IO), a suite of tools using AI methods to prevent and mitigate performance losses due to IO contention. AI4IO enables existing workload managers to become IO-aware. Currently, AI4IO consists of two tools: PRIONN and CanarIO. PRIONN predicts IO contention and empowers schedulers to prevent it. CanarIO mitigates the impact of IO contention when it does occur. We measure the effectiveness of AI4IO when integrated into Flux, a next-generation scheduler, for both small- and large-scale IO-intensive job workloads. Our results show that integrating AI4IO into Flux improves the workload makespan up to 6.4%, which can account for more than 18,000 node-h of saved resources per week on a production cluster in our large-scale workload.</abstract><cop>London, England</cop><pub>SAGE Publications</pub><doi>10.1177/10943420221079765</doi><tpages>18</tpages><orcidid>https://orcid.org/0000-0002-0031-6377</orcidid><orcidid>https://orcid.org/0000-0003-0141-0653</orcidid><orcidid>https://orcid.org/0000000301410653</orcidid><orcidid>https://orcid.org/0000000200316377</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 1094-3420
ispartof The international journal of high performance computing applications, 2022-05, Vol.36 (3), p.370-387
issn 1094-3420
1741-2846
language eng
recordid cdi_osti_scitechconnect_1860988
source SAGE Complete A-Z List; Alma/SFX Local Collection
subjects Managers
Performance degradation
Resource allocation
Workload
Workloads
title AI4IO: A suite of AI-based tools for IO-aware scheduling
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-21T18%3A20%3A26IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_osti_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=AI4IO:%20A%20suite%20of%20AI-based%20tools%20for%20IO-aware%20scheduling&rft.jtitle=The%20international%20journal%20of%20high%20performance%20computing%20applications&rft.au=Wyatt,%20Michael%20R&rft.date=2022-05-01&rft.volume=36&rft.issue=3&rft.spage=370&rft.epage=387&rft.pages=370-387&rft.issn=1094-3420&rft.eissn=1741-2846&rft_id=info:doi/10.1177/10943420221079765&rft_dat=%3Cproquest_osti_%3E2665044523%3C/proquest_osti_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2665044523&rft_id=info:pmid/&rft_sage_id=10.1177_10943420221079765&rfr_iscdi=true