RuYi: Optimizing Burst Buffer through Automated, Fine-Grained Process-to-BB Mapping

Current supercomputers use an SSD-based storage layer called Burst Buffer (BB) to provide I/O-intensive applications with accelerated storage access. However, efficiently utilizing this limited and expensive storage remains a critical issue, creating an urgent need for implementing Quality of Servic...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on computers 2024-12, p.1-13
Hauptverfasser: Hua, Yusheng, Shi, Xuanhua, He, Ligang, He, Kang, Zhang, Teng, Jin, Hai, Chen, Yong
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Current supercomputers use an SSD-based storage layer called Burst Buffer (BB) to provide I/O-intensive applications with accelerated storage access. However, efficiently utilizing this limited and expensive storage remains a critical issue, creating an urgent need for implementing Quality of Service (QoS) in BB. To address this, we propose RuYi, a QoS-aware method to provide applications with bandwidth guarantees in the BB file system. RuYi tackles two main issues. First, it quantitatively profiles available bandwidth resources in BB to ensure reliable QoS, a crucial aspect seldom studied in the literature. Second, RuYi offers fine-grained process-level QoS via an innovative process-to-BB mapping, maximizing resource utilization-something not achievable with conventional coarse-grained compute-to-BB mapping. We evaluated RuYi on a subsystem of the leading exascale supercomputer Sunway, consisting of 4,000 compute nodes and 200 BB nodes. The experimental results demonstrate that RuYi achieves an impressive end-to-end bandwidth control accuracy of 97%, while improving BB utilization by up to 116% compared to conventional coarse-grained compute-to-BB mapping.
ISSN:0018-9340
1557-9956
DOI:10.1109/TC.2024.3510624