CernVM-FS at Extreme Scales

The CernVM File System (CVMFS) provides the software distribution backbone for High Energy and Nuclear Physics experiments and many other scientific communities in the form of a globally available shared software area. It has been designed for the software distribution problem of experiment software...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Promberger, Laura, Blomer, Jakob, Völkl, Valentin, Harvey, Matt
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The CernVM File System (CVMFS) provides the software distribution backbone for High Energy and Nuclear Physics experiments and many other scientific communities in the form of a globally available shared software area. It has been designed for the software distribution problem of experiment software for LHC Runs 1 and 2. For LHC Run 3 and even more so for HL-LHC (Runs 4-6), the complexity of the experiment software stacks and their build pipelines is substantially larger. For instance, software is being distributed for several CPU architectures, often in the form of containers which includes base and operating system libraries, the number of external packages such as machine learning libraries has multiplied, and there is a shift from C++ to more Python-heavy software stacks that results in more and smaller files needing to be distributed. For CVMFS, the new software landscape means an order of magnitude increase of scale in several key metrics. This contribution reports on the performance and reliability engineering on the file system client to sustain current and expected future software access load. Concretely, the impact of the newly designed file system cache management is shown, including significant performance improvements for HEP-representative benchmark workloads, and an up to 25% performance increase in software built-time when the build tools reside on CVMFS. Operational improvements presented include better network failure handling, error reporting, and integration with container runtimes. And a pilot study using zstd as compression algorithm shows that it could bring significant improvements for remote data access times.
ISSN:2100-014X
2101-6275
2100-014X
DOI:10.1051/epjconf/202429504012