ExaHDF5: Delivering Efficient Parallel I/O on Exascale Computing Systems
Scientific applications at exascale generate and analyze massive amounts of data. A critical requirement of these applications is the capability to access and manage this data efficiently on exascale systems. Parallel I/O, the key technology enables moving data between compute nodes and storage, fac...
Gespeichert in:
Veröffentlicht in: | Journal of computer science and technology 2020-01, Vol.35 (1), p.145-160 |
---|---|
Hauptverfasser: | , , , , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 160 |
---|---|
container_issue | 1 |
container_start_page | 145 |
container_title | Journal of computer science and technology |
container_volume | 35 |
creator | Byna, Suren Breitenfeld, M. Scot Dong, Bin Koziol, Quincey Pourmal, Elena Robinson, Dana Soumagne, Jerome Tang, Houjun Vishwanath, Venkatram Warren, Richard |
description | Scientific applications at exascale generate and analyze massive amounts of data. A critical requirement of these applications is the capability to access and manage this data efficiently on exascale systems. Parallel I/O, the key technology enables moving data between compute nodes and storage, faces monumental challenges from new applications, memory, and storage architectures considered in the designs of exascale systems. As the storage hierarchy is expanding to include node-local persistent memory, burst buffers, etc., as well as disk-based storage, data movement among these layers must be efficient. Parallel I/O libraries of the future should be capable of handling file sizes of many terabytes and beyond. In this paper, we describe new capabilities we have developed in Hierarchical Data Format version 5 (HDF5), the most popular parallel I/O library for scientific applications. HDF5 is one of the most used libraries at the leadership computing facilities for performing parallel I/O on existing HPC systems. The state-of-the-art features we describe include: Virtual Object Layer (VOL), Data Elevator, asynchronous I/O, full-featured single-writer and multiple-reader (Full SWMR), and parallel querying. In this paper, we introduce these features, their implementations, and the performance and feature benefits to applications and other libraries. |
doi_str_mv | 10.1007/s11390-020-9822-9 |
format | Article |
fullrecord | <record><control><sourceid>wanfang_jour_osti_</sourceid><recordid>TN_cdi_osti_scitechconnect_1582374</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A718216943</galeid><wanfj_id>jsjkxjsxb_e202001009</wanfj_id><sourcerecordid>jsjkxjsxb_e202001009</sourcerecordid><originalsourceid>FETCH-LOGICAL-c462t-eda1cad2be2a5f3d5c453c0c745fe64b0be8e88933af03c149be52efcdac98e73</originalsourceid><addsrcrecordid>eNp1kV1P2zAYhSO0SXTAD-Au2m4XeP2RD-8OlbIiITEJuLYc53Vxljqd7bLy7-cok3qFLNmW_Rz76JwsuyRwRQDq60AIE1AAhUI0lBbiJFuQpoKC11x8SnuAdJOm0-xLCD0Aq4HzRbZeHdT69q78kd_iYN_QW7fJV8ZYbdHF_JfyahhwyO-vH_PR5YkOWg2YL8ftbh8n-Ok9RNyG8-yzUUPAi__rWfZyt3perouHx5_3y5uHQvOKxgI7RbTqaItUlYZ1peYl06BrXhqseAstNtg0gjFlgGnCRYslRaM7pUWDNTvLvs7vjiFaGbSNqF_16BzqKEnZUFbzBH2fob_KGeU2sh_33iVbsg_970MfDq1EmsKClItI-LcZ3_nxzx5DPPJUpBQJIXz6-WqmNikAaZ0Zo1c6jQ63NjlAY9P5TU0aSirBWRKQWaD9GIJHI3febpV_lwTk1JqcW5PJiJxak5MVOmvCbqoC_dHKx6J_yQOYTw</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2918611147</pqid></control><display><type>article</type><title>ExaHDF5: Delivering Efficient Parallel I/O on Exascale Computing Systems</title><source>SpringerNature Journals</source><creator>Byna, Suren ; Breitenfeld, M. Scot ; Dong, Bin ; Koziol, Quincey ; Pourmal, Elena ; Robinson, Dana ; Soumagne, Jerome ; Tang, Houjun ; Vishwanath, Venkatram ; Warren, Richard</creator><creatorcontrib>Byna, Suren ; Breitenfeld, M. Scot ; Dong, Bin ; Koziol, Quincey ; Pourmal, Elena ; Robinson, Dana ; Soumagne, Jerome ; Tang, Houjun ; Vishwanath, Venkatram ; Warren, Richard ; Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)</creatorcontrib><description>Scientific applications at exascale generate and analyze massive amounts of data. A critical requirement of these applications is the capability to access and manage this data efficiently on exascale systems. Parallel I/O, the key technology enables moving data between compute nodes and storage, faces monumental challenges from new applications, memory, and storage architectures considered in the designs of exascale systems. As the storage hierarchy is expanding to include node-local persistent memory, burst buffers, etc., as well as disk-based storage, data movement among these layers must be efficient. Parallel I/O libraries of the future should be capable of handling file sizes of many terabytes and beyond. In this paper, we describe new capabilities we have developed in Hierarchical Data Format version 5 (HDF5), the most popular parallel I/O library for scientific applications. HDF5 is one of the most used libraries at the leadership computing facilities for performing parallel I/O on existing HPC systems. The state-of-the-art features we describe include: Virtual Object Layer (VOL), Data Elevator, asynchronous I/O, full-featured single-writer and multiple-reader (Full SWMR), and parallel querying. In this paper, we introduce these features, their implementations, and the performance and feature benefits to applications and other libraries.</description><identifier>ISSN: 1000-9000</identifier><identifier>EISSN: 1860-4749</identifier><identifier>DOI: 10.1007/s11390-020-9822-9</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Analysis ; Artificial Intelligence ; Computation ; Computer Science ; Data Structures and Information Theory ; HDF5 optimizations ; Hierarchical Data Format version 5 (HDF5) ; I/O performance ; Information storage and retrieval ; Information Systems Applications (incl.Internet) ; Libraries ; MATHEMATICS AND COMPUTING ; parallel I/O ; Regular Paper ; Software Engineering ; Theory of Computation ; virtual object layer</subject><ispartof>Journal of computer science and technology, 2020-01, Vol.35 (1), p.145-160</ispartof><rights>Institute of Computing Technology, Chinese Academy of Sciences & Springer Nature Singapore Pte Ltd. 2020</rights><rights>COPYRIGHT 2020 Springer</rights><rights>Institute of Computing Technology, Chinese Academy of Sciences & Springer Nature Singapore Pte Ltd. 2020.</rights><rights>Copyright © Wanfang Data Co. Ltd. All Rights Reserved.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c462t-eda1cad2be2a5f3d5c453c0c745fe64b0be8e88933af03c149be52efcdac98e73</citedby><cites>FETCH-LOGICAL-c462t-eda1cad2be2a5f3d5c453c0c745fe64b0be8e88933af03c149be52efcdac98e73</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Uhttp://www.wanfangdata.com.cn/images/PeriodicalImages/jsjkxjsxb-e/jsjkxjsxb-e.jpg</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s11390-020-9822-9$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s11390-020-9822-9$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>230,314,780,784,885,27924,27925,41488,42557,51319</link.rule.ids><backlink>$$Uhttps://www.osti.gov/servlets/purl/1582374$$D View this record in Osti.gov$$Hfree_for_read</backlink></links><search><creatorcontrib>Byna, Suren</creatorcontrib><creatorcontrib>Breitenfeld, M. Scot</creatorcontrib><creatorcontrib>Dong, Bin</creatorcontrib><creatorcontrib>Koziol, Quincey</creatorcontrib><creatorcontrib>Pourmal, Elena</creatorcontrib><creatorcontrib>Robinson, Dana</creatorcontrib><creatorcontrib>Soumagne, Jerome</creatorcontrib><creatorcontrib>Tang, Houjun</creatorcontrib><creatorcontrib>Vishwanath, Venkatram</creatorcontrib><creatorcontrib>Warren, Richard</creatorcontrib><creatorcontrib>Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)</creatorcontrib><title>ExaHDF5: Delivering Efficient Parallel I/O on Exascale Computing Systems</title><title>Journal of computer science and technology</title><addtitle>J. Comput. Sci. Technol</addtitle><description>Scientific applications at exascale generate and analyze massive amounts of data. A critical requirement of these applications is the capability to access and manage this data efficiently on exascale systems. Parallel I/O, the key technology enables moving data between compute nodes and storage, faces monumental challenges from new applications, memory, and storage architectures considered in the designs of exascale systems. As the storage hierarchy is expanding to include node-local persistent memory, burst buffers, etc., as well as disk-based storage, data movement among these layers must be efficient. Parallel I/O libraries of the future should be capable of handling file sizes of many terabytes and beyond. In this paper, we describe new capabilities we have developed in Hierarchical Data Format version 5 (HDF5), the most popular parallel I/O library for scientific applications. HDF5 is one of the most used libraries at the leadership computing facilities for performing parallel I/O on existing HPC systems. The state-of-the-art features we describe include: Virtual Object Layer (VOL), Data Elevator, asynchronous I/O, full-featured single-writer and multiple-reader (Full SWMR), and parallel querying. In this paper, we introduce these features, their implementations, and the performance and feature benefits to applications and other libraries.</description><subject>Analysis</subject><subject>Artificial Intelligence</subject><subject>Computation</subject><subject>Computer Science</subject><subject>Data Structures and Information Theory</subject><subject>HDF5 optimizations</subject><subject>Hierarchical Data Format version 5 (HDF5)</subject><subject>I/O performance</subject><subject>Information storage and retrieval</subject><subject>Information Systems Applications (incl.Internet)</subject><subject>Libraries</subject><subject>MATHEMATICS AND COMPUTING</subject><subject>parallel I/O</subject><subject>Regular Paper</subject><subject>Software Engineering</subject><subject>Theory of Computation</subject><subject>virtual object layer</subject><issn>1000-9000</issn><issn>1860-4749</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><recordid>eNp1kV1P2zAYhSO0SXTAD-Au2m4XeP2RD-8OlbIiITEJuLYc53Vxljqd7bLy7-cok3qFLNmW_Rz76JwsuyRwRQDq60AIE1AAhUI0lBbiJFuQpoKC11x8SnuAdJOm0-xLCD0Aq4HzRbZeHdT69q78kd_iYN_QW7fJV8ZYbdHF_JfyahhwyO-vH_PR5YkOWg2YL8ftbh8n-Ok9RNyG8-yzUUPAi__rWfZyt3perouHx5_3y5uHQvOKxgI7RbTqaItUlYZ1peYl06BrXhqseAstNtg0gjFlgGnCRYslRaM7pUWDNTvLvs7vjiFaGbSNqF_16BzqKEnZUFbzBH2fob_KGeU2sh_33iVbsg_970MfDq1EmsKClItI-LcZ3_nxzx5DPPJUpBQJIXz6-WqmNikAaZ0Zo1c6jQ63NjlAY9P5TU0aSirBWRKQWaD9GIJHI3febpV_lwTk1JqcW5PJiJxak5MVOmvCbqoC_dHKx6J_yQOYTw</recordid><startdate>20200101</startdate><enddate>20200101</enddate><creator>Byna, Suren</creator><creator>Breitenfeld, M. Scot</creator><creator>Dong, Bin</creator><creator>Koziol, Quincey</creator><creator>Pourmal, Elena</creator><creator>Robinson, Dana</creator><creator>Soumagne, Jerome</creator><creator>Tang, Houjun</creator><creator>Vishwanath, Venkatram</creator><creator>Warren, Richard</creator><general>Springer US</general><general>Springer</general><general>Springer Nature B.V</general><general>Lawrence Berkeley National Laboratory, Berkeley, CA 94597, U.S.A.%The HDF Group, Champaign, IL 61820, U.S.A.%Argonne National Laboratory, Lemont, IL 60439, U.S.A</general><general>Springer Nature</general><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7SC</scope><scope>7WY</scope><scope>7WZ</scope><scope>7XB</scope><scope>87Z</scope><scope>8AL</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>8FL</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BEZIV</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FRNLG</scope><scope>F~G</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K60</scope><scope>K6~</scope><scope>K7-</scope><scope>L.-</scope><scope>L6V</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>M0C</scope><scope>M0N</scope><scope>M7S</scope><scope>P5Z</scope><scope>P62</scope><scope>PQBIZ</scope><scope>PQBZA</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PTHSS</scope><scope>Q9U</scope><scope>2B.</scope><scope>4A8</scope><scope>92I</scope><scope>93N</scope><scope>PSX</scope><scope>TCJ</scope><scope>OIOZB</scope><scope>OTOTI</scope></search><sort><creationdate>20200101</creationdate><title>ExaHDF5: Delivering Efficient Parallel I/O on Exascale Computing Systems</title><author>Byna, Suren ; Breitenfeld, M. Scot ; Dong, Bin ; Koziol, Quincey ; Pourmal, Elena ; Robinson, Dana ; Soumagne, Jerome ; Tang, Houjun ; Vishwanath, Venkatram ; Warren, Richard</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c462t-eda1cad2be2a5f3d5c453c0c745fe64b0be8e88933af03c149be52efcdac98e73</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Analysis</topic><topic>Artificial Intelligence</topic><topic>Computation</topic><topic>Computer Science</topic><topic>Data Structures and Information Theory</topic><topic>HDF5 optimizations</topic><topic>Hierarchical Data Format version 5 (HDF5)</topic><topic>I/O performance</topic><topic>Information storage and retrieval</topic><topic>Information Systems Applications (incl.Internet)</topic><topic>Libraries</topic><topic>MATHEMATICS AND COMPUTING</topic><topic>parallel I/O</topic><topic>Regular Paper</topic><topic>Software Engineering</topic><topic>Theory of Computation</topic><topic>virtual object layer</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Byna, Suren</creatorcontrib><creatorcontrib>Breitenfeld, M. Scot</creatorcontrib><creatorcontrib>Dong, Bin</creatorcontrib><creatorcontrib>Koziol, Quincey</creatorcontrib><creatorcontrib>Pourmal, Elena</creatorcontrib><creatorcontrib>Robinson, Dana</creatorcontrib><creatorcontrib>Soumagne, Jerome</creatorcontrib><creatorcontrib>Tang, Houjun</creatorcontrib><creatorcontrib>Vishwanath, Venkatram</creatorcontrib><creatorcontrib>Warren, Richard</creatorcontrib><creatorcontrib>Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Computer and Information Systems Abstracts</collection><collection>Access via ABI/INFORM (ProQuest)</collection><collection>ABI/INFORM Global (PDF only)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>ABI/INFORM Global (Alumni Edition)</collection><collection>Computing Database (Alumni Edition)</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ABI/INFORM Collection (Alumni Edition)</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Business Premium Collection</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Business Premium Collection (Alumni)</collection><collection>ABI/INFORM Global (Corporate)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Business Collection (Alumni Edition)</collection><collection>ProQuest Business Collection</collection><collection>Computer Science Database</collection><collection>ABI/INFORM Professional Advanced</collection><collection>ProQuest Engineering Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>ABI/INFORM Global</collection><collection>Computing Database</collection><collection>Engineering Database</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>ProQuest One Business</collection><collection>ProQuest One Business (Alumni)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>Engineering Collection</collection><collection>ProQuest Central Basic</collection><collection>Wanfang Data Journals - Hong Kong</collection><collection>WANFANG Data Centre</collection><collection>Wanfang Data Journals</collection><collection>万方数据期刊 - 香港版</collection><collection>China Online Journals (COJ)</collection><collection>China Online Journals (COJ)</collection><collection>OSTI.GOV - Hybrid</collection><collection>OSTI.GOV</collection><jtitle>Journal of computer science and technology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Byna, Suren</au><au>Breitenfeld, M. Scot</au><au>Dong, Bin</au><au>Koziol, Quincey</au><au>Pourmal, Elena</au><au>Robinson, Dana</au><au>Soumagne, Jerome</au><au>Tang, Houjun</au><au>Vishwanath, Venkatram</au><au>Warren, Richard</au><aucorp>Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)</aucorp><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>ExaHDF5: Delivering Efficient Parallel I/O on Exascale Computing Systems</atitle><jtitle>Journal of computer science and technology</jtitle><stitle>J. Comput. Sci. Technol</stitle><date>2020-01-01</date><risdate>2020</risdate><volume>35</volume><issue>1</issue><spage>145</spage><epage>160</epage><pages>145-160</pages><issn>1000-9000</issn><eissn>1860-4749</eissn><abstract>Scientific applications at exascale generate and analyze massive amounts of data. A critical requirement of these applications is the capability to access and manage this data efficiently on exascale systems. Parallel I/O, the key technology enables moving data between compute nodes and storage, faces monumental challenges from new applications, memory, and storage architectures considered in the designs of exascale systems. As the storage hierarchy is expanding to include node-local persistent memory, burst buffers, etc., as well as disk-based storage, data movement among these layers must be efficient. Parallel I/O libraries of the future should be capable of handling file sizes of many terabytes and beyond. In this paper, we describe new capabilities we have developed in Hierarchical Data Format version 5 (HDF5), the most popular parallel I/O library for scientific applications. HDF5 is one of the most used libraries at the leadership computing facilities for performing parallel I/O on existing HPC systems. The state-of-the-art features we describe include: Virtual Object Layer (VOL), Data Elevator, asynchronous I/O, full-featured single-writer and multiple-reader (Full SWMR), and parallel querying. In this paper, we introduce these features, their implementations, and the performance and feature benefits to applications and other libraries.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s11390-020-9822-9</doi><tpages>16</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1000-9000 |
ispartof | Journal of computer science and technology, 2020-01, Vol.35 (1), p.145-160 |
issn | 1000-9000 1860-4749 |
language | eng |
recordid | cdi_osti_scitechconnect_1582374 |
source | SpringerNature Journals |
subjects | Analysis Artificial Intelligence Computation Computer Science Data Structures and Information Theory HDF5 optimizations Hierarchical Data Format version 5 (HDF5) I/O performance Information storage and retrieval Information Systems Applications (incl.Internet) Libraries MATHEMATICS AND COMPUTING parallel I/O Regular Paper Software Engineering Theory of Computation virtual object layer |
title | ExaHDF5: Delivering Efficient Parallel I/O on Exascale Computing Systems |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-01T17%3A58%3A40IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-wanfang_jour_osti_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=ExaHDF5:%20Delivering%20Efficient%20Parallel%20I/O%20on%20Exascale%20Computing%20Systems&rft.jtitle=Journal%20of%20computer%20science%20and%20technology&rft.au=Byna,%20Suren&rft.aucorp=Lawrence%20Berkeley%20National%20Lab.%20(LBNL),%20Berkeley,%20CA%20(United%20States)&rft.date=2020-01-01&rft.volume=35&rft.issue=1&rft.spage=145&rft.epage=160&rft.pages=145-160&rft.issn=1000-9000&rft.eissn=1860-4749&rft_id=info:doi/10.1007/s11390-020-9822-9&rft_dat=%3Cwanfang_jour_osti_%3Ejsjkxjsxb_e202001009%3C/wanfang_jour_osti_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2918611147&rft_id=info:pmid/&rft_galeid=A718216943&rft_wanfj_id=jsjkxjsxb_e202001009&rfr_iscdi=true |