ExaHDF5: Delivering Efficient Parallel I/O on Exascale Computing Systems

Scientific applications at exascale generate and analyze massive amounts of data. A critical requirement of these applications is the capability to access and manage this data efficiently on exascale systems. Parallel I/O, the key technology enables moving data between compute nodes and storage, fac...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of computer science and technology 2020-01, Vol.35 (1), p.145-160
Hauptverfasser: Byna, Suren, Breitenfeld, M. Scot, Dong, Bin, Koziol, Quincey, Pourmal, Elena, Robinson, Dana, Soumagne, Jerome, Tang, Houjun, Vishwanath, Venkatram, Warren, Richard
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 160
container_issue 1
container_start_page 145
container_title Journal of computer science and technology
container_volume 35
creator Byna, Suren
Breitenfeld, M. Scot
Dong, Bin
Koziol, Quincey
Pourmal, Elena
Robinson, Dana
Soumagne, Jerome
Tang, Houjun
Vishwanath, Venkatram
Warren, Richard
description Scientific applications at exascale generate and analyze massive amounts of data. A critical requirement of these applications is the capability to access and manage this data efficiently on exascale systems. Parallel I/O, the key technology enables moving data between compute nodes and storage, faces monumental challenges from new applications, memory, and storage architectures considered in the designs of exascale systems. As the storage hierarchy is expanding to include node-local persistent memory, burst buffers, etc., as well as disk-based storage, data movement among these layers must be efficient. Parallel I/O libraries of the future should be capable of handling file sizes of many terabytes and beyond. In this paper, we describe new capabilities we have developed in Hierarchical Data Format version 5 (HDF5), the most popular parallel I/O library for scientific applications. HDF5 is one of the most used libraries at the leadership computing facilities for performing parallel I/O on existing HPC systems. The state-of-the-art features we describe include: Virtual Object Layer (VOL), Data Elevator, asynchronous I/O, full-featured single-writer and multiple-reader (Full SWMR), and parallel querying. In this paper, we introduce these features, their implementations, and the performance and feature benefits to applications and other libraries.
doi_str_mv 10.1007/s11390-020-9822-9
format Article
fullrecord <record><control><sourceid>wanfang_jour_osti_</sourceid><recordid>TN_cdi_osti_scitechconnect_1582374</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A718216943</galeid><wanfj_id>jsjkxjsxb_e202001009</wanfj_id><sourcerecordid>jsjkxjsxb_e202001009</sourcerecordid><originalsourceid>FETCH-LOGICAL-c462t-eda1cad2be2a5f3d5c453c0c745fe64b0be8e88933af03c149be52efcdac98e73</originalsourceid><addsrcrecordid>eNp1kV1P2zAYhSO0SXTAD-Au2m4XeP2RD-8OlbIiITEJuLYc53Vxljqd7bLy7-cok3qFLNmW_Rz76JwsuyRwRQDq60AIE1AAhUI0lBbiJFuQpoKC11x8SnuAdJOm0-xLCD0Aq4HzRbZeHdT69q78kd_iYN_QW7fJV8ZYbdHF_JfyahhwyO-vH_PR5YkOWg2YL8ftbh8n-Ok9RNyG8-yzUUPAi__rWfZyt3perouHx5_3y5uHQvOKxgI7RbTqaItUlYZ1peYl06BrXhqseAstNtg0gjFlgGnCRYslRaM7pUWDNTvLvs7vjiFaGbSNqF_16BzqKEnZUFbzBH2fob_KGeU2sh_33iVbsg_970MfDq1EmsKClItI-LcZ3_nxzx5DPPJUpBQJIXz6-WqmNikAaZ0Zo1c6jQ63NjlAY9P5TU0aSirBWRKQWaD9GIJHI3febpV_lwTk1JqcW5PJiJxak5MVOmvCbqoC_dHKx6J_yQOYTw</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2918611147</pqid></control><display><type>article</type><title>ExaHDF5: Delivering Efficient Parallel I/O on Exascale Computing Systems</title><source>SpringerNature Journals</source><creator>Byna, Suren ; Breitenfeld, M. Scot ; Dong, Bin ; Koziol, Quincey ; Pourmal, Elena ; Robinson, Dana ; Soumagne, Jerome ; Tang, Houjun ; Vishwanath, Venkatram ; Warren, Richard</creator><creatorcontrib>Byna, Suren ; Breitenfeld, M. Scot ; Dong, Bin ; Koziol, Quincey ; Pourmal, Elena ; Robinson, Dana ; Soumagne, Jerome ; Tang, Houjun ; Vishwanath, Venkatram ; Warren, Richard ; Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)</creatorcontrib><description>Scientific applications at exascale generate and analyze massive amounts of data. A critical requirement of these applications is the capability to access and manage this data efficiently on exascale systems. Parallel I/O, the key technology enables moving data between compute nodes and storage, faces monumental challenges from new applications, memory, and storage architectures considered in the designs of exascale systems. As the storage hierarchy is expanding to include node-local persistent memory, burst buffers, etc., as well as disk-based storage, data movement among these layers must be efficient. Parallel I/O libraries of the future should be capable of handling file sizes of many terabytes and beyond. In this paper, we describe new capabilities we have developed in Hierarchical Data Format version 5 (HDF5), the most popular parallel I/O library for scientific applications. HDF5 is one of the most used libraries at the leadership computing facilities for performing parallel I/O on existing HPC systems. The state-of-the-art features we describe include: Virtual Object Layer (VOL), Data Elevator, asynchronous I/O, full-featured single-writer and multiple-reader (Full SWMR), and parallel querying. In this paper, we introduce these features, their implementations, and the performance and feature benefits to applications and other libraries.</description><identifier>ISSN: 1000-9000</identifier><identifier>EISSN: 1860-4749</identifier><identifier>DOI: 10.1007/s11390-020-9822-9</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Analysis ; Artificial Intelligence ; Computation ; Computer Science ; Data Structures and Information Theory ; HDF5 optimizations ; Hierarchical Data Format version 5 (HDF5) ; I/O performance ; Information storage and retrieval ; Information Systems Applications (incl.Internet) ; Libraries ; MATHEMATICS AND COMPUTING ; parallel I/O ; Regular Paper ; Software Engineering ; Theory of Computation ; virtual object layer</subject><ispartof>Journal of computer science and technology, 2020-01, Vol.35 (1), p.145-160</ispartof><rights>Institute of Computing Technology, Chinese Academy of Sciences &amp; Springer Nature Singapore Pte Ltd. 2020</rights><rights>COPYRIGHT 2020 Springer</rights><rights>Institute of Computing Technology, Chinese Academy of Sciences &amp; Springer Nature Singapore Pte Ltd. 2020.</rights><rights>Copyright © Wanfang Data Co. Ltd. All Rights Reserved.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c462t-eda1cad2be2a5f3d5c453c0c745fe64b0be8e88933af03c149be52efcdac98e73</citedby><cites>FETCH-LOGICAL-c462t-eda1cad2be2a5f3d5c453c0c745fe64b0be8e88933af03c149be52efcdac98e73</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Uhttp://www.wanfangdata.com.cn/images/PeriodicalImages/jsjkxjsxb-e/jsjkxjsxb-e.jpg</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s11390-020-9822-9$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s11390-020-9822-9$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>230,314,780,784,885,27924,27925,41488,42557,51319</link.rule.ids><backlink>$$Uhttps://www.osti.gov/servlets/purl/1582374$$D View this record in Osti.gov$$Hfree_for_read</backlink></links><search><creatorcontrib>Byna, Suren</creatorcontrib><creatorcontrib>Breitenfeld, M. Scot</creatorcontrib><creatorcontrib>Dong, Bin</creatorcontrib><creatorcontrib>Koziol, Quincey</creatorcontrib><creatorcontrib>Pourmal, Elena</creatorcontrib><creatorcontrib>Robinson, Dana</creatorcontrib><creatorcontrib>Soumagne, Jerome</creatorcontrib><creatorcontrib>Tang, Houjun</creatorcontrib><creatorcontrib>Vishwanath, Venkatram</creatorcontrib><creatorcontrib>Warren, Richard</creatorcontrib><creatorcontrib>Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)</creatorcontrib><title>ExaHDF5: Delivering Efficient Parallel I/O on Exascale Computing Systems</title><title>Journal of computer science and technology</title><addtitle>J. Comput. Sci. Technol</addtitle><description>Scientific applications at exascale generate and analyze massive amounts of data. A critical requirement of these applications is the capability to access and manage this data efficiently on exascale systems. Parallel I/O, the key technology enables moving data between compute nodes and storage, faces monumental challenges from new applications, memory, and storage architectures considered in the designs of exascale systems. As the storage hierarchy is expanding to include node-local persistent memory, burst buffers, etc., as well as disk-based storage, data movement among these layers must be efficient. Parallel I/O libraries of the future should be capable of handling file sizes of many terabytes and beyond. In this paper, we describe new capabilities we have developed in Hierarchical Data Format version 5 (HDF5), the most popular parallel I/O library for scientific applications. HDF5 is one of the most used libraries at the leadership computing facilities for performing parallel I/O on existing HPC systems. The state-of-the-art features we describe include: Virtual Object Layer (VOL), Data Elevator, asynchronous I/O, full-featured single-writer and multiple-reader (Full SWMR), and parallel querying. In this paper, we introduce these features, their implementations, and the performance and feature benefits to applications and other libraries.</description><subject>Analysis</subject><subject>Artificial Intelligence</subject><subject>Computation</subject><subject>Computer Science</subject><subject>Data Structures and Information Theory</subject><subject>HDF5 optimizations</subject><subject>Hierarchical Data Format version 5 (HDF5)</subject><subject>I/O performance</subject><subject>Information storage and retrieval</subject><subject>Information Systems Applications (incl.Internet)</subject><subject>Libraries</subject><subject>MATHEMATICS AND COMPUTING</subject><subject>parallel I/O</subject><subject>Regular Paper</subject><subject>Software Engineering</subject><subject>Theory of Computation</subject><subject>virtual object layer</subject><issn>1000-9000</issn><issn>1860-4749</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><recordid>eNp1kV1P2zAYhSO0SXTAD-Au2m4XeP2RD-8OlbIiITEJuLYc53Vxljqd7bLy7-cok3qFLNmW_Rz76JwsuyRwRQDq60AIE1AAhUI0lBbiJFuQpoKC11x8SnuAdJOm0-xLCD0Aq4HzRbZeHdT69q78kd_iYN_QW7fJV8ZYbdHF_JfyahhwyO-vH_PR5YkOWg2YL8ftbh8n-Ok9RNyG8-yzUUPAi__rWfZyt3perouHx5_3y5uHQvOKxgI7RbTqaItUlYZ1peYl06BrXhqseAstNtg0gjFlgGnCRYslRaM7pUWDNTvLvs7vjiFaGbSNqF_16BzqKEnZUFbzBH2fob_KGeU2sh_33iVbsg_970MfDq1EmsKClItI-LcZ3_nxzx5DPPJUpBQJIXz6-WqmNikAaZ0Zo1c6jQ63NjlAY9P5TU0aSirBWRKQWaD9GIJHI3febpV_lwTk1JqcW5PJiJxak5MVOmvCbqoC_dHKx6J_yQOYTw</recordid><startdate>20200101</startdate><enddate>20200101</enddate><creator>Byna, Suren</creator><creator>Breitenfeld, M. Scot</creator><creator>Dong, Bin</creator><creator>Koziol, Quincey</creator><creator>Pourmal, Elena</creator><creator>Robinson, Dana</creator><creator>Soumagne, Jerome</creator><creator>Tang, Houjun</creator><creator>Vishwanath, Venkatram</creator><creator>Warren, Richard</creator><general>Springer US</general><general>Springer</general><general>Springer Nature B.V</general><general>Lawrence Berkeley National Laboratory, Berkeley, CA 94597, U.S.A.%The HDF Group, Champaign, IL 61820, U.S.A.%Argonne National Laboratory, Lemont, IL 60439, U.S.A</general><general>Springer Nature</general><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7SC</scope><scope>7WY</scope><scope>7WZ</scope><scope>7XB</scope><scope>87Z</scope><scope>8AL</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>8FL</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BEZIV</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FRNLG</scope><scope>F~G</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K60</scope><scope>K6~</scope><scope>K7-</scope><scope>L.-</scope><scope>L6V</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>M0C</scope><scope>M0N</scope><scope>M7S</scope><scope>P5Z</scope><scope>P62</scope><scope>PQBIZ</scope><scope>PQBZA</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PTHSS</scope><scope>Q9U</scope><scope>2B.</scope><scope>4A8</scope><scope>92I</scope><scope>93N</scope><scope>PSX</scope><scope>TCJ</scope><scope>OIOZB</scope><scope>OTOTI</scope></search><sort><creationdate>20200101</creationdate><title>ExaHDF5: Delivering Efficient Parallel I/O on Exascale Computing Systems</title><author>Byna, Suren ; Breitenfeld, M. Scot ; Dong, Bin ; Koziol, Quincey ; Pourmal, Elena ; Robinson, Dana ; Soumagne, Jerome ; Tang, Houjun ; Vishwanath, Venkatram ; Warren, Richard</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c462t-eda1cad2be2a5f3d5c453c0c745fe64b0be8e88933af03c149be52efcdac98e73</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Analysis</topic><topic>Artificial Intelligence</topic><topic>Computation</topic><topic>Computer Science</topic><topic>Data Structures and Information Theory</topic><topic>HDF5 optimizations</topic><topic>Hierarchical Data Format version 5 (HDF5)</topic><topic>I/O performance</topic><topic>Information storage and retrieval</topic><topic>Information Systems Applications (incl.Internet)</topic><topic>Libraries</topic><topic>MATHEMATICS AND COMPUTING</topic><topic>parallel I/O</topic><topic>Regular Paper</topic><topic>Software Engineering</topic><topic>Theory of Computation</topic><topic>virtual object layer</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Byna, Suren</creatorcontrib><creatorcontrib>Breitenfeld, M. Scot</creatorcontrib><creatorcontrib>Dong, Bin</creatorcontrib><creatorcontrib>Koziol, Quincey</creatorcontrib><creatorcontrib>Pourmal, Elena</creatorcontrib><creatorcontrib>Robinson, Dana</creatorcontrib><creatorcontrib>Soumagne, Jerome</creatorcontrib><creatorcontrib>Tang, Houjun</creatorcontrib><creatorcontrib>Vishwanath, Venkatram</creatorcontrib><creatorcontrib>Warren, Richard</creatorcontrib><creatorcontrib>Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Computer and Information Systems Abstracts</collection><collection>Access via ABI/INFORM (ProQuest)</collection><collection>ABI/INFORM Global (PDF only)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>ABI/INFORM Global (Alumni Edition)</collection><collection>Computing Database (Alumni Edition)</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ABI/INFORM Collection (Alumni Edition)</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Business Premium Collection</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Business Premium Collection (Alumni)</collection><collection>ABI/INFORM Global (Corporate)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Business Collection (Alumni Edition)</collection><collection>ProQuest Business Collection</collection><collection>Computer Science Database</collection><collection>ABI/INFORM Professional Advanced</collection><collection>ProQuest Engineering Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>ABI/INFORM Global</collection><collection>Computing Database</collection><collection>Engineering Database</collection><collection>Advanced Technologies &amp; Aerospace Database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest One Business</collection><collection>ProQuest One Business (Alumni)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>Engineering Collection</collection><collection>ProQuest Central Basic</collection><collection>Wanfang Data Journals - Hong Kong</collection><collection>WANFANG Data Centre</collection><collection>Wanfang Data Journals</collection><collection>万方数据期刊 - 香港版</collection><collection>China Online Journals (COJ)</collection><collection>China Online Journals (COJ)</collection><collection>OSTI.GOV - Hybrid</collection><collection>OSTI.GOV</collection><jtitle>Journal of computer science and technology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Byna, Suren</au><au>Breitenfeld, M. Scot</au><au>Dong, Bin</au><au>Koziol, Quincey</au><au>Pourmal, Elena</au><au>Robinson, Dana</au><au>Soumagne, Jerome</au><au>Tang, Houjun</au><au>Vishwanath, Venkatram</au><au>Warren, Richard</au><aucorp>Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)</aucorp><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>ExaHDF5: Delivering Efficient Parallel I/O on Exascale Computing Systems</atitle><jtitle>Journal of computer science and technology</jtitle><stitle>J. Comput. Sci. Technol</stitle><date>2020-01-01</date><risdate>2020</risdate><volume>35</volume><issue>1</issue><spage>145</spage><epage>160</epage><pages>145-160</pages><issn>1000-9000</issn><eissn>1860-4749</eissn><abstract>Scientific applications at exascale generate and analyze massive amounts of data. A critical requirement of these applications is the capability to access and manage this data efficiently on exascale systems. Parallel I/O, the key technology enables moving data between compute nodes and storage, faces monumental challenges from new applications, memory, and storage architectures considered in the designs of exascale systems. As the storage hierarchy is expanding to include node-local persistent memory, burst buffers, etc., as well as disk-based storage, data movement among these layers must be efficient. Parallel I/O libraries of the future should be capable of handling file sizes of many terabytes and beyond. In this paper, we describe new capabilities we have developed in Hierarchical Data Format version 5 (HDF5), the most popular parallel I/O library for scientific applications. HDF5 is one of the most used libraries at the leadership computing facilities for performing parallel I/O on existing HPC systems. The state-of-the-art features we describe include: Virtual Object Layer (VOL), Data Elevator, asynchronous I/O, full-featured single-writer and multiple-reader (Full SWMR), and parallel querying. In this paper, we introduce these features, their implementations, and the performance and feature benefits to applications and other libraries.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s11390-020-9822-9</doi><tpages>16</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1000-9000
ispartof Journal of computer science and technology, 2020-01, Vol.35 (1), p.145-160
issn 1000-9000
1860-4749
language eng
recordid cdi_osti_scitechconnect_1582374
source SpringerNature Journals
subjects Analysis
Artificial Intelligence
Computation
Computer Science
Data Structures and Information Theory
HDF5 optimizations
Hierarchical Data Format version 5 (HDF5)
I/O performance
Information storage and retrieval
Information Systems Applications (incl.Internet)
Libraries
MATHEMATICS AND COMPUTING
parallel I/O
Regular Paper
Software Engineering
Theory of Computation
virtual object layer
title ExaHDF5: Delivering Efficient Parallel I/O on Exascale Computing Systems
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-01T17%3A58%3A40IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-wanfang_jour_osti_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=ExaHDF5:%20Delivering%20Efficient%20Parallel%20I/O%20on%20Exascale%20Computing%20Systems&rft.jtitle=Journal%20of%20computer%20science%20and%20technology&rft.au=Byna,%20Suren&rft.aucorp=Lawrence%20Berkeley%20National%20Lab.%20(LBNL),%20Berkeley,%20CA%20(United%20States)&rft.date=2020-01-01&rft.volume=35&rft.issue=1&rft.spage=145&rft.epage=160&rft.pages=145-160&rft.issn=1000-9000&rft.eissn=1860-4749&rft_id=info:doi/10.1007/s11390-020-9822-9&rft_dat=%3Cwanfang_jour_osti_%3Ejsjkxjsxb_e202001009%3C/wanfang_jour_osti_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2918611147&rft_id=info:pmid/&rft_galeid=A718216943&rft_wanfj_id=jsjkxjsxb_e202001009&rfr_iscdi=true