The Design of Fast Delta Encoding for Delta Compression Based Storage Systems

Delta encoding is a data reduction technique capable of calculating the differences (i.e., delta) among very similar files and chunks. It is widely used for various applications, such as synchronization replication, backup/archival storage, cache compression, and so on. However, delta encoding is co...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:ACM transactions on storage 2024-11, Vol.20 (4), p.1-30, Article 23
Hauptverfasser: Tan, Haoliang, Xia, Wen, Zou, Xiangyu, Deng, Cai, Liao, Qing, Gu, Zhaoquan
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 30
container_issue 4
container_start_page 1
container_title ACM transactions on storage
container_volume 20
creator Tan, Haoliang
Xia, Wen
Zou, Xiangyu
Deng, Cai
Liao, Qing
Gu, Zhaoquan
description Delta encoding is a data reduction technique capable of calculating the differences (i.e., delta) among very similar files and chunks. It is widely used for various applications, such as synchronization replication, backup/archival storage, cache compression, and so on. However, delta encoding is computationally costly due to its time-consuming word-matching operations for delta calculation. Existing delta encoding approaches either run at a slow encoding speed, such as Xdelta and Zdelta, or at a low compression ratio, such as Ddelta and Edelta. In this article, we propose Gdelta, a fast delta encoding approach with a high compression ratio. The key idea behind Gdelta is the combined use of five techniques: (1) employing an improved Gear-based rolling hash to replace Adler32 hash for fast scanning overlapping words of similar chunks, (2) adopting a quick array-based indexing for word-matching, (3) applying a sampling indexing scheme to reduce the cost of traditional building full indexes for base chunks’ words, (4) skipping unmatched words to accelerate delta encoding through non-redundant areas, and (5) last but not least, after word-matching, further batch compressing the remainder to improve the compression ratio. Our evaluation results driven by seven real-world datasets suggest that Gdelta achieves encoding/decoding speedups of 3.5X∼25X over the classic Xdelta and Zdelta approaches while increasing the compression ratio by about 10%∼240%.
doi_str_mv 10.1145/3664817
format Article
fullrecord <record><control><sourceid>acm_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1145_3664817</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3664817</sourcerecordid><originalsourceid>FETCH-LOGICAL-a169t-bea399c061dc9c8714edd0ca7e34b0bc317903000f6cc23ed173d793225fb98c3</originalsourceid><addsrcrecordid>eNo9kDFPwzAQRi0EEqUgdiZvTIFzLrHjEUILSEUMLXPk2JcQ1MSVnaX_nqKGTnf3fU83PMZuBTwIkeWPKGVWCHXGZiLPMUHQeH7albpkVzH-AKBMs3zGPjbfxF8odu3AfcOXJo6Hczsavhisd93Q8saHKSp9vwsUY-cH_mwiOb4efTAt8fU-jtTHa3bRmG2km2nO2ddysSnfktXn63v5tEqMkHpMajKotQUpnNW2UCIj58AaRZjVUFsUSgMCQCOtTZGcUOiUxjTNm1oXFufs_vjXBh9joKbaha43YV8JqP4sVJOFA3l3JI3tT9B_-QsqHVY_</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>The Design of Fast Delta Encoding for Delta Compression Based Storage Systems</title><source>ACM Digital Library Complete</source><creator>Tan, Haoliang ; Xia, Wen ; Zou, Xiangyu ; Deng, Cai ; Liao, Qing ; Gu, Zhaoquan</creator><creatorcontrib>Tan, Haoliang ; Xia, Wen ; Zou, Xiangyu ; Deng, Cai ; Liao, Qing ; Gu, Zhaoquan</creatorcontrib><description>Delta encoding is a data reduction technique capable of calculating the differences (i.e., delta) among very similar files and chunks. It is widely used for various applications, such as synchronization replication, backup/archival storage, cache compression, and so on. However, delta encoding is computationally costly due to its time-consuming word-matching operations for delta calculation. Existing delta encoding approaches either run at a slow encoding speed, such as Xdelta and Zdelta, or at a low compression ratio, such as Ddelta and Edelta. In this article, we propose Gdelta, a fast delta encoding approach with a high compression ratio. The key idea behind Gdelta is the combined use of five techniques: (1) employing an improved Gear-based rolling hash to replace Adler32 hash for fast scanning overlapping words of similar chunks, (2) adopting a quick array-based indexing for word-matching, (3) applying a sampling indexing scheme to reduce the cost of traditional building full indexes for base chunks’ words, (4) skipping unmatched words to accelerate delta encoding through non-redundant areas, and (5) last but not least, after word-matching, further batch compressing the remainder to improve the compression ratio. Our evaluation results driven by seven real-world datasets suggest that Gdelta achieves encoding/decoding speedups of 3.5X∼25X over the classic Xdelta and Zdelta approaches while increasing the compression ratio by about 10%∼240%.</description><identifier>ISSN: 1553-3077</identifier><identifier>EISSN: 1553-3093</identifier><identifier>DOI: 10.1145/3664817</identifier><language>eng</language><publisher>New York, NY: ACM</publisher><subject>Data compression ; Data compression systems ; Information systems ; Theory of computation</subject><ispartof>ACM transactions on storage, 2024-11, Vol.20 (4), p.1-30, Article 23</ispartof><rights>Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-a169t-bea399c061dc9c8714edd0ca7e34b0bc317903000f6cc23ed173d793225fb98c3</cites><orcidid>0000-0002-7717-6990 ; 0000-0003-4093-6391 ; 0000-0003-1012-5301 ; 0000-0001-7546-852X ; 0009-0003-9919-1926 ; 0000-0001-5104-8301</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://dl.acm.org/doi/pdf/10.1145/3664817$$EPDF$$P50$$Gacm$$H</linktopdf><link.rule.ids>314,776,780,2276,27901,27902,40172,75971</link.rule.ids></links><search><creatorcontrib>Tan, Haoliang</creatorcontrib><creatorcontrib>Xia, Wen</creatorcontrib><creatorcontrib>Zou, Xiangyu</creatorcontrib><creatorcontrib>Deng, Cai</creatorcontrib><creatorcontrib>Liao, Qing</creatorcontrib><creatorcontrib>Gu, Zhaoquan</creatorcontrib><title>The Design of Fast Delta Encoding for Delta Compression Based Storage Systems</title><title>ACM transactions on storage</title><addtitle>ACM TOS</addtitle><description>Delta encoding is a data reduction technique capable of calculating the differences (i.e., delta) among very similar files and chunks. It is widely used for various applications, such as synchronization replication, backup/archival storage, cache compression, and so on. However, delta encoding is computationally costly due to its time-consuming word-matching operations for delta calculation. Existing delta encoding approaches either run at a slow encoding speed, such as Xdelta and Zdelta, or at a low compression ratio, such as Ddelta and Edelta. In this article, we propose Gdelta, a fast delta encoding approach with a high compression ratio. The key idea behind Gdelta is the combined use of five techniques: (1) employing an improved Gear-based rolling hash to replace Adler32 hash for fast scanning overlapping words of similar chunks, (2) adopting a quick array-based indexing for word-matching, (3) applying a sampling indexing scheme to reduce the cost of traditional building full indexes for base chunks’ words, (4) skipping unmatched words to accelerate delta encoding through non-redundant areas, and (5) last but not least, after word-matching, further batch compressing the remainder to improve the compression ratio. Our evaluation results driven by seven real-world datasets suggest that Gdelta achieves encoding/decoding speedups of 3.5X∼25X over the classic Xdelta and Zdelta approaches while increasing the compression ratio by about 10%∼240%.</description><subject>Data compression</subject><subject>Data compression systems</subject><subject>Information systems</subject><subject>Theory of computation</subject><issn>1553-3077</issn><issn>1553-3093</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNo9kDFPwzAQRi0EEqUgdiZvTIFzLrHjEUILSEUMLXPk2JcQ1MSVnaX_nqKGTnf3fU83PMZuBTwIkeWPKGVWCHXGZiLPMUHQeH7albpkVzH-AKBMs3zGPjbfxF8odu3AfcOXJo6Hczsavhisd93Q8saHKSp9vwsUY-cH_mwiOb4efTAt8fU-jtTHa3bRmG2km2nO2ddysSnfktXn63v5tEqMkHpMajKotQUpnNW2UCIj58AaRZjVUFsUSgMCQCOtTZGcUOiUxjTNm1oXFufs_vjXBh9joKbaha43YV8JqP4sVJOFA3l3JI3tT9B_-QsqHVY_</recordid><startdate>20241130</startdate><enddate>20241130</enddate><creator>Tan, Haoliang</creator><creator>Xia, Wen</creator><creator>Zou, Xiangyu</creator><creator>Deng, Cai</creator><creator>Liao, Qing</creator><creator>Gu, Zhaoquan</creator><general>ACM</general><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0002-7717-6990</orcidid><orcidid>https://orcid.org/0000-0003-4093-6391</orcidid><orcidid>https://orcid.org/0000-0003-1012-5301</orcidid><orcidid>https://orcid.org/0000-0001-7546-852X</orcidid><orcidid>https://orcid.org/0009-0003-9919-1926</orcidid><orcidid>https://orcid.org/0000-0001-5104-8301</orcidid></search><sort><creationdate>20241130</creationdate><title>The Design of Fast Delta Encoding for Delta Compression Based Storage Systems</title><author>Tan, Haoliang ; Xia, Wen ; Zou, Xiangyu ; Deng, Cai ; Liao, Qing ; Gu, Zhaoquan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a169t-bea399c061dc9c8714edd0ca7e34b0bc317903000f6cc23ed173d793225fb98c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Data compression</topic><topic>Data compression systems</topic><topic>Information systems</topic><topic>Theory of computation</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Tan, Haoliang</creatorcontrib><creatorcontrib>Xia, Wen</creatorcontrib><creatorcontrib>Zou, Xiangyu</creatorcontrib><creatorcontrib>Deng, Cai</creatorcontrib><creatorcontrib>Liao, Qing</creatorcontrib><creatorcontrib>Gu, Zhaoquan</creatorcontrib><collection>CrossRef</collection><jtitle>ACM transactions on storage</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Tan, Haoliang</au><au>Xia, Wen</au><au>Zou, Xiangyu</au><au>Deng, Cai</au><au>Liao, Qing</au><au>Gu, Zhaoquan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>The Design of Fast Delta Encoding for Delta Compression Based Storage Systems</atitle><jtitle>ACM transactions on storage</jtitle><stitle>ACM TOS</stitle><date>2024-11-30</date><risdate>2024</risdate><volume>20</volume><issue>4</issue><spage>1</spage><epage>30</epage><pages>1-30</pages><artnum>23</artnum><issn>1553-3077</issn><eissn>1553-3093</eissn><abstract>Delta encoding is a data reduction technique capable of calculating the differences (i.e., delta) among very similar files and chunks. It is widely used for various applications, such as synchronization replication, backup/archival storage, cache compression, and so on. However, delta encoding is computationally costly due to its time-consuming word-matching operations for delta calculation. Existing delta encoding approaches either run at a slow encoding speed, such as Xdelta and Zdelta, or at a low compression ratio, such as Ddelta and Edelta. In this article, we propose Gdelta, a fast delta encoding approach with a high compression ratio. The key idea behind Gdelta is the combined use of five techniques: (1) employing an improved Gear-based rolling hash to replace Adler32 hash for fast scanning overlapping words of similar chunks, (2) adopting a quick array-based indexing for word-matching, (3) applying a sampling indexing scheme to reduce the cost of traditional building full indexes for base chunks’ words, (4) skipping unmatched words to accelerate delta encoding through non-redundant areas, and (5) last but not least, after word-matching, further batch compressing the remainder to improve the compression ratio. Our evaluation results driven by seven real-world datasets suggest that Gdelta achieves encoding/decoding speedups of 3.5X∼25X over the classic Xdelta and Zdelta approaches while increasing the compression ratio by about 10%∼240%.</abstract><cop>New York, NY</cop><pub>ACM</pub><doi>10.1145/3664817</doi><tpages>30</tpages><orcidid>https://orcid.org/0000-0002-7717-6990</orcidid><orcidid>https://orcid.org/0000-0003-4093-6391</orcidid><orcidid>https://orcid.org/0000-0003-1012-5301</orcidid><orcidid>https://orcid.org/0000-0001-7546-852X</orcidid><orcidid>https://orcid.org/0009-0003-9919-1926</orcidid><orcidid>https://orcid.org/0000-0001-5104-8301</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1553-3077
ispartof ACM transactions on storage, 2024-11, Vol.20 (4), p.1-30, Article 23
issn 1553-3077
1553-3093
language eng
recordid cdi_crossref_primary_10_1145_3664817
source ACM Digital Library Complete
subjects Data compression
Data compression systems
Information systems
Theory of computation
title The Design of Fast Delta Encoding for Delta Compression Based Storage Systems
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-31T15%3A27%3A35IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-acm_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=The%20Design%20of%20Fast%20Delta%20Encoding%20for%20Delta%20Compression%20Based%20Storage%20Systems&rft.jtitle=ACM%20transactions%20on%20storage&rft.au=Tan,%20Haoliang&rft.date=2024-11-30&rft.volume=20&rft.issue=4&rft.spage=1&rft.epage=30&rft.pages=1-30&rft.artnum=23&rft.issn=1553-3077&rft.eissn=1553-3093&rft_id=info:doi/10.1145/3664817&rft_dat=%3Cacm_cross%3E3664817%3C/acm_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true