A Data-driven Approach to Harvesting Latent Reduced Models to Precondition Lossy Compression for Scientific Data

In this paper, we propose and evaluate the idea that data need to be preconditioned prior to compression, such that they can better match the design philosophies of lossy compressors for HPC scientific data. In particular, we aim to identify a reduced model that can be utilized to transform the orig...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on big data 2023-06, Vol.9 (3), p.949-963
Hauptverfasser:	Luo, Huizhang, Wang, Junqi, Qin, Zhenlu, Huang, Dan, Liu, Qing, Zhou, Mengchu, Jiang, Hong
Format:	Artikel
Sprache:	eng
Schlagworte:	Adaptation models Analytical models Compressibility Compression ratio compressor selection Compressors Computational modeling Computer Science Data models data preconditioning Data reduction Discrete Wavelet Transform high-performance computing Mathematical models Preconditioning Principal components analysis Singular value decomposition Transforms Wavelet transforms
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	963
container_issue	3
container_start_page	949
container_title	IEEE transactions on big data
container_volume	9
creator	Luo, Huizhang Wang, Junqi Qin, Zhenlu Huang, Dan Liu, Qing Zhou, Mengchu Jiang, Hong
description	In this paper, we propose and evaluate the idea that data need to be preconditioned prior to compression, such that they can better match the design philosophies of lossy compressors for HPC scientific data. In particular, we aim to identify a reduced model that can be utilized to transform the original data into a more compressible form. We begin with two PDE applications as a proof of concept, in which we demonstrate that a reduced model can indeed reside in the full model output, and can be utilized to improve compression ratios. A mathematical proof is also presented to show how the compression ratio is improved by the reduced model. We further explore more general dimension reduction techniques to extract the reduced model, including principal component analysis, singular value decomposition, and discrete wavelet transform. After preconditioning, the reduced model in conjunction with difference between the reduced model and full model is stored, which results in higher compression ratios. We evaluate the reduced models on ten scientific datasets, and the results show the effectiveness of our approaches. Given that there is no single method that consistently achieves the best performance, we further propose a selection strategy that guides users to select the best reduced model prior to data reduction.
doi_str_mv	10.1109/TBDATA.2022.3225959
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_2812842044</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9968123</ieee_id><sourcerecordid>2812842044</sourcerecordid><originalsourceid>FETCH-LOGICAL-c274t-e96625b2d9e0b83e6723d63c2dcfbcfabc4931f48b1721e8a23a459df2007c723</originalsourceid><addsrcrecordid>eNo9kU9LxDAQxYsoKOon2EvQc9dk0m2bY13_rLCi6HoOaTLViDY1yQr77U2teJph-L2ZebwsmzE6Z4yKi83lVbNp5kAB5hxgIRZiLzsCXkEOVJT7Y88hrypBD7PTEN4ppayklAs4yoaGXKmocuPtN_akGQbvlH4j0ZGV8t8You1fyVpF7CN5QrPVaMi9M_gRRubRo3a9sdG6nqxdCDuydJ-DxxDGSec8edY2aW1n9e-lk-ygUx8BT__qcfZyc71ZrvL1w-3dslnnGqoi5ijKEhYtGIG0rTmWFXBTcg1Gd63uVKsLwVlX1C2rgGGtgKtiIUwHlFY6wcfZ2bTXJQsyaBtRv6Vfe9RRQgFciDpB5xOUXH9tk1n57ra-T39JqBnUBdCiSBSfKO2TQ4-dHLz9VH4nGZVjBHKKQI4RyL8Ikmo2qSwi_iuEKNNizn8AEGaCUA</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2812842044</pqid></control><display><type>article</type><title>A Data-driven Approach to Harvesting Latent Reduced Models to Precondition Lossy Compression for Scientific Data</title><source>IEEE Electronic Library (IEL)</source><creator>Luo, Huizhang ; Wang, Junqi ; Qin, Zhenlu ; Huang, Dan ; Liu, Qing ; Zhou, Mengchu ; Jiang, Hong</creator><creatorcontrib>Luo, Huizhang ; Wang, Junqi ; Qin, Zhenlu ; Huang, Dan ; Liu, Qing ; Zhou, Mengchu ; Jiang, Hong ; Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF)</creatorcontrib><description>In this paper, we propose and evaluate the idea that data need to be preconditioned prior to compression, such that they can better match the design philosophies of lossy compressors for HPC scientific data. In particular, we aim to identify a reduced model that can be utilized to transform the original data into a more compressible form. We begin with two PDE applications as a proof of concept, in which we demonstrate that a reduced model can indeed reside in the full model output, and can be utilized to improve compression ratios. A mathematical proof is also presented to show how the compression ratio is improved by the reduced model. We further explore more general dimension reduction techniques to extract the reduced model, including principal component analysis, singular value decomposition, and discrete wavelet transform. After preconditioning, the reduced model in conjunction with difference between the reduced model and full model is stored, which results in higher compression ratios. We evaluate the reduced models on ten scientific datasets, and the results show the effectiveness of our approaches. Given that there is no single method that consistently achieves the best performance, we further propose a selection strategy that guides users to select the best reduced model prior to data reduction.</description><identifier>ISSN: 2332-7790</identifier><identifier>ISSN: 2372-2096</identifier><identifier>EISSN: 2372-2096</identifier><identifier>DOI: 10.1109/TBDATA.2022.3225959</identifier><identifier>CODEN: ITBDAX</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Adaptation models ; Analytical models ; Compressibility ; Compression ratio ; compressor selection ; Compressors ; Computational modeling ; Computer Science ; Data models ; data preconditioning ; Data reduction ; Discrete Wavelet Transform ; high-performance computing ; Mathematical models ; Preconditioning ; Principal components analysis ; Singular value decomposition ; Transforms ; Wavelet transforms</subject><ispartof>IEEE transactions on big data, 2023-06, Vol.9 (3), p.949-963</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c274t-e96625b2d9e0b83e6723d63c2dcfbcfabc4931f48b1721e8a23a459df2007c723</cites><orcidid>0000-0003-2392-0267 ; 0000-0001-5582-1031 ; 0000-0002-3427-366X ; 0000-0002-1477-9751 ; 0000-0002-0408-9853 ; 0000-0002-7600-7976 ; 0000-0002-5408-8752 ; 0000000155821031 ; 0000000204089853 ; 0000000254088752 ; 0000000323920267 ; 000000023427366X ; 0000000214779751 ; 0000000276007976</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9968123$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>230,315,781,785,797,886,27928,27929,54762</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9968123$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttps://www.osti.gov/biblio/2423998$$D View this record in Osti.gov$$Hfree_for_read</backlink></links><search><creatorcontrib>Luo, Huizhang</creatorcontrib><creatorcontrib>Wang, Junqi</creatorcontrib><creatorcontrib>Qin, Zhenlu</creatorcontrib><creatorcontrib>Huang, Dan</creatorcontrib><creatorcontrib>Liu, Qing</creatorcontrib><creatorcontrib>Zhou, Mengchu</creatorcontrib><creatorcontrib>Jiang, Hong</creatorcontrib><creatorcontrib>Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF)</creatorcontrib><title>A Data-driven Approach to Harvesting Latent Reduced Models to Precondition Lossy Compression for Scientific Data</title><title>IEEE transactions on big data</title><addtitle>TBData</addtitle><description>In this paper, we propose and evaluate the idea that data need to be preconditioned prior to compression, such that they can better match the design philosophies of lossy compressors for HPC scientific data. In particular, we aim to identify a reduced model that can be utilized to transform the original data into a more compressible form. We begin with two PDE applications as a proof of concept, in which we demonstrate that a reduced model can indeed reside in the full model output, and can be utilized to improve compression ratios. A mathematical proof is also presented to show how the compression ratio is improved by the reduced model. We further explore more general dimension reduction techniques to extract the reduced model, including principal component analysis, singular value decomposition, and discrete wavelet transform. After preconditioning, the reduced model in conjunction with difference between the reduced model and full model is stored, which results in higher compression ratios. We evaluate the reduced models on ten scientific datasets, and the results show the effectiveness of our approaches. Given that there is no single method that consistently achieves the best performance, we further propose a selection strategy that guides users to select the best reduced model prior to data reduction.</description><subject>Adaptation models</subject><subject>Analytical models</subject><subject>Compressibility</subject><subject>Compression ratio</subject><subject>compressor selection</subject><subject>Compressors</subject><subject>Computational modeling</subject><subject>Computer Science</subject><subject>Data models</subject><subject>data preconditioning</subject><subject>Data reduction</subject><subject>Discrete Wavelet Transform</subject><subject>high-performance computing</subject><subject>Mathematical models</subject><subject>Preconditioning</subject><subject>Principal components analysis</subject><subject>Singular value decomposition</subject><subject>Transforms</subject><subject>Wavelet transforms</subject><issn>2332-7790</issn><issn>2372-2096</issn><issn>2372-2096</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kU9LxDAQxYsoKOon2EvQc9dk0m2bY13_rLCi6HoOaTLViDY1yQr77U2teJph-L2ZebwsmzE6Z4yKi83lVbNp5kAB5hxgIRZiLzsCXkEOVJT7Y88hrypBD7PTEN4ppayklAs4yoaGXKmocuPtN_akGQbvlH4j0ZGV8t8You1fyVpF7CN5QrPVaMi9M_gRRubRo3a9sdG6nqxdCDuydJ-DxxDGSec8edY2aW1n9e-lk-ygUx8BT__qcfZyc71ZrvL1w-3dslnnGqoi5ijKEhYtGIG0rTmWFXBTcg1Gd63uVKsLwVlX1C2rgGGtgKtiIUwHlFY6wcfZ2bTXJQsyaBtRv6Vfe9RRQgFciDpB5xOUXH9tk1n57ra-T39JqBnUBdCiSBSfKO2TQ4-dHLz9VH4nGZVjBHKKQI4RyL8Ikmo2qSwi_iuEKNNizn8AEGaCUA</recordid><startdate>20230601</startdate><enddate>20230601</enddate><creator>Luo, Huizhang</creator><creator>Wang, Junqi</creator><creator>Qin, Zhenlu</creator><creator>Huang, Dan</creator><creator>Liu, Qing</creator><creator>Zhou, Mengchu</creator><creator>Jiang, Hong</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SP</scope><scope>8FD</scope><scope>L7M</scope><scope>OTOTI</scope><orcidid>https://orcid.org/0000-0003-2392-0267</orcidid><orcidid>https://orcid.org/0000-0001-5582-1031</orcidid><orcidid>https://orcid.org/0000-0002-3427-366X</orcidid><orcidid>https://orcid.org/0000-0002-1477-9751</orcidid><orcidid>https://orcid.org/0000-0002-0408-9853</orcidid><orcidid>https://orcid.org/0000-0002-7600-7976</orcidid><orcidid>https://orcid.org/0000-0002-5408-8752</orcidid><orcidid>https://orcid.org/0000000155821031</orcidid><orcidid>https://orcid.org/0000000204089853</orcidid><orcidid>https://orcid.org/0000000254088752</orcidid><orcidid>https://orcid.org/0000000323920267</orcidid><orcidid>https://orcid.org/000000023427366X</orcidid><orcidid>https://orcid.org/0000000214779751</orcidid><orcidid>https://orcid.org/0000000276007976</orcidid></search><sort><creationdate>20230601</creationdate><title>A Data-driven Approach to Harvesting Latent Reduced Models to Precondition Lossy Compression for Scientific Data</title><author>Luo, Huizhang ; Wang, Junqi ; Qin, Zhenlu ; Huang, Dan ; Liu, Qing ; Zhou, Mengchu ; Jiang, Hong</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c274t-e96625b2d9e0b83e6723d63c2dcfbcfabc4931f48b1721e8a23a459df2007c723</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Adaptation models</topic><topic>Analytical models</topic><topic>Compressibility</topic><topic>Compression ratio</topic><topic>compressor selection</topic><topic>Compressors</topic><topic>Computational modeling</topic><topic>Computer Science</topic><topic>Data models</topic><topic>data preconditioning</topic><topic>Data reduction</topic><topic>Discrete Wavelet Transform</topic><topic>high-performance computing</topic><topic>Mathematical models</topic><topic>Preconditioning</topic><topic>Principal components analysis</topic><topic>Singular value decomposition</topic><topic>Transforms</topic><topic>Wavelet transforms</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Luo, Huizhang</creatorcontrib><creatorcontrib>Wang, Junqi</creatorcontrib><creatorcontrib>Qin, Zhenlu</creatorcontrib><creatorcontrib>Huang, Dan</creatorcontrib><creatorcontrib>Liu, Qing</creatorcontrib><creatorcontrib>Zhou, Mengchu</creatorcontrib><creatorcontrib>Jiang, Hong</creatorcontrib><creatorcontrib>Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF)</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>OSTI.GOV</collection><jtitle>IEEE transactions on big data</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Luo, Huizhang</au><au>Wang, Junqi</au><au>Qin, Zhenlu</au><au>Huang, Dan</au><au>Liu, Qing</au><au>Zhou, Mengchu</au><au>Jiang, Hong</au><aucorp>Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF)</aucorp><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A Data-driven Approach to Harvesting Latent Reduced Models to Precondition Lossy Compression for Scientific Data</atitle><jtitle>IEEE transactions on big data</jtitle><stitle>TBData</stitle><date>2023-06-01</date><risdate>2023</risdate><volume>9</volume><issue>3</issue><spage>949</spage><epage>963</epage><pages>949-963</pages><issn>2332-7790</issn><issn>2372-2096</issn><eissn>2372-2096</eissn><coden>ITBDAX</coden><abstract>In this paper, we propose and evaluate the idea that data need to be preconditioned prior to compression, such that they can better match the design philosophies of lossy compressors for HPC scientific data. In particular, we aim to identify a reduced model that can be utilized to transform the original data into a more compressible form. We begin with two PDE applications as a proof of concept, in which we demonstrate that a reduced model can indeed reside in the full model output, and can be utilized to improve compression ratios. A mathematical proof is also presented to show how the compression ratio is improved by the reduced model. We further explore more general dimension reduction techniques to extract the reduced model, including principal component analysis, singular value decomposition, and discrete wavelet transform. After preconditioning, the reduced model in conjunction with difference between the reduced model and full model is stored, which results in higher compression ratios. We evaluate the reduced models on ten scientific datasets, and the results show the effectiveness of our approaches. Given that there is no single method that consistently achieves the best performance, we further propose a selection strategy that guides users to select the best reduced model prior to data reduction.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/TBDATA.2022.3225959</doi><tpages>15</tpages><orcidid>https://orcid.org/0000-0003-2392-0267</orcidid><orcidid>https://orcid.org/0000-0001-5582-1031</orcidid><orcidid>https://orcid.org/0000-0002-3427-366X</orcidid><orcidid>https://orcid.org/0000-0002-1477-9751</orcidid><orcidid>https://orcid.org/0000-0002-0408-9853</orcidid><orcidid>https://orcid.org/0000-0002-7600-7976</orcidid><orcidid>https://orcid.org/0000-0002-5408-8752</orcidid><orcidid>https://orcid.org/0000000155821031</orcidid><orcidid>https://orcid.org/0000000204089853</orcidid><orcidid>https://orcid.org/0000000254088752</orcidid><orcidid>https://orcid.org/0000000323920267</orcidid><orcidid>https://orcid.org/000000023427366X</orcidid><orcidid>https://orcid.org/0000000214779751</orcidid><orcidid>https://orcid.org/0000000276007976</orcidid></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 2332-7790
ispartof	IEEE transactions on big data, 2023-06, Vol.9 (3), p.949-963
issn	2332-7790 2372-2096 2372-2096
language	eng
recordid	cdi_proquest_journals_2812842044
source	IEEE Electronic Library (IEL)
subjects	Adaptation models Analytical models Compressibility Compression ratio compressor selection Compressors Computational modeling Computer Science Data models data preconditioning Data reduction Discrete Wavelet Transform high-performance computing Mathematical models Preconditioning Principal components analysis Singular value decomposition Transforms Wavelet transforms
title	A Data-driven Approach to Harvesting Latent Reduced Models to Precondition Lossy Compression for Scientific Data
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-17T04%3A13%3A53IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20Data-driven%20Approach%20to%20Harvesting%20Latent%20Reduced%20Models%20to%20Precondition%20Lossy%20Compression%20for%20Scientific%20Data&rft.jtitle=IEEE%20transactions%20on%20big%20data&rft.au=Luo,%20Huizhang&rft.aucorp=Oak%20Ridge%20National%20Laboratory%20(ORNL),%20Oak%20Ridge,%20TN%20(United%20States).%20Oak%20Ridge%20Leadership%20Computing%20Facility%20(OLCF)&rft.date=2023-06-01&rft.volume=9&rft.issue=3&rft.spage=949&rft.epage=963&rft.pages=949-963&rft.issn=2332-7790&rft.eissn=2372-2096&rft.coden=ITBDAX&rft_id=info:doi/10.1109/TBDATA.2022.3225959&rft_dat=%3Cproquest_RIE%3E2812842044%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2812842044&rft_id=info:pmid/&rft_ieee_id=9968123&rfr_iscdi=true