Improving Restore Performance for In-Line Backup System Combining Deduplication and Delta Compression

Data deduplication, though being efficient in removing duplicate chunks, introduces chunk fragmentation which decreases restore performance. Rewriting algorithms are proposed to reduce the chunk fragmentation. Delta compression is often used as a complement for data deduplication to further improve...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on parallel and distributed systems 2020-10, Vol.31 (10), p.2302-2314
Hauptverfasser: Zhang, Yucheng, Yuan, Ye, Feng, Dan, Wang, Chunzhi, Wu, Xinyun, Yan, Lingyu, Pan, Deng, Wang, Shuanghong
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 2314
container_issue 10
container_start_page 2302
container_title IEEE transactions on parallel and distributed systems
container_volume 31
creator Zhang, Yucheng
Yuan, Ye
Feng, Dan
Wang, Chunzhi
Wu, Xinyun
Yan, Lingyu
Pan, Deng
Wang, Shuanghong
description Data deduplication, though being efficient in removing duplicate chunks, introduces chunk fragmentation which decreases restore performance. Rewriting algorithms are proposed to reduce the chunk fragmentation. Delta compression is often used as a complement for data deduplication to further improve storage efficiency. We observe that delta compression introduces a new type of chunk fragmentation stemming from improper delta compression for chunks of which the base chunks are fragmented. The new type of chunk fragmentation severely decreases restore performance and cannot be addressed by existing rewriting algorithms. To address this problem, we propose SDC, a scheme performing post-deduplication delta compression only for the chunks of which the bases can be directly found in the restore cache to eliminate additional disk reads for base chunks, thus avoiding the new type of chunk fragmentation. In addition, self-referenced chunks can be fragmented, which decrease restore performance, and these fragmented chunks can serve as bases to decrease the restore performance repeatedly. We propose a hybrid rewriting scheme for SDC to rewrite such fragmented chunks. Experimental results show that SDC improves the restore performance of the approach that directly performs delta compression after data deduplication by 2.9-16.9x, and achieves more than 95 percent of its compression gains.
doi_str_mv 10.1109/TPDS.2020.2991030
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_2402498875</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9080096</ieee_id><sourcerecordid>2402498875</sourcerecordid><originalsourceid>FETCH-LOGICAL-c293t-fb9db40118dd4ea531a7e3462c5053b9778e14af2ca48e000d68a16f50faf67f3</originalsourceid><addsrcrecordid>eNo9kF9LwzAUxYMoOKcfQHwp-Nx5kyZt8qibfwYDh5vPJW1vpHNNa9IK-_ambPh0D5ffuZdzCLmlMKMU1MN2vdjMGDCYMaUoJHBGJlQIGTMqk_OggYtYMaouyZX3OwDKBfAJwWXTufa3tl_RB_q-dRit0ZnWNdqWGAURLW28qi1GT7r8Hrpoc_A9NtG8bYrajr4FVkO3r0vd162NtK3CZt_rkegceh-21-TC6L3Hm9Ocks-X5-38LV69vy7nj6u4ZCrpY1OoquBAqawqjlokVGeY8JSVAkRSqCyTSLk2rNRcIgBUqdQ0NQKMNmlmkim5P94NmX6GkCfftYOz4WXOODCupMxEoOiRKl3rvUOTd65utDvkFPKxzXxsMx_bzE9tBs_d0VMj4j-vQAKoNPkDxtBxVg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2402498875</pqid></control><display><type>article</type><title>Improving Restore Performance for In-Line Backup System Combining Deduplication and Delta Compression</title><source>IEEE</source><creator>Zhang, Yucheng ; Yuan, Ye ; Feng, Dan ; Wang, Chunzhi ; Wu, Xinyun ; Yan, Lingyu ; Pan, Deng ; Wang, Shuanghong</creator><creatorcontrib>Zhang, Yucheng ; Yuan, Ye ; Feng, Dan ; Wang, Chunzhi ; Wu, Xinyun ; Yan, Lingyu ; Pan, Deng ; Wang, Shuanghong</creatorcontrib><description>Data deduplication, though being efficient in removing duplicate chunks, introduces chunk fragmentation which decreases restore performance. Rewriting algorithms are proposed to reduce the chunk fragmentation. Delta compression is often used as a complement for data deduplication to further improve storage efficiency. We observe that delta compression introduces a new type of chunk fragmentation stemming from improper delta compression for chunks of which the base chunks are fragmented. The new type of chunk fragmentation severely decreases restore performance and cannot be addressed by existing rewriting algorithms. To address this problem, we propose SDC, a scheme performing post-deduplication delta compression only for the chunks of which the bases can be directly found in the restore cache to eliminate additional disk reads for base chunks, thus avoiding the new type of chunk fragmentation. In addition, self-referenced chunks can be fragmented, which decrease restore performance, and these fragmented chunks can serve as bases to decrease the restore performance repeatedly. We propose a hybrid rewriting scheme for SDC to rewrite such fragmented chunks. Experimental results show that SDC improves the restore performance of the approach that directly performs delta compression after data deduplication by 2.9-16.9x, and achieves more than 95 percent of its compression gains.</description><identifier>ISSN: 1045-9219</identifier><identifier>EISSN: 1558-2183</identifier><identifier>DOI: 10.1109/TPDS.2020.2991030</identifier><identifier>CODEN: ITDSEO</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Acceleration ; Algorithms ; chunk fragmentation ; Computer science ; Containers ; Data compression ; Data deduplication ; Data storage systems ; delta compression ; Fragmentation ; Indexes ; Measurement ; Redundancy ; restore performance ; storage system</subject><ispartof>IEEE transactions on parallel and distributed systems, 2020-10, Vol.31 (10), p.2302-2314</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c293t-fb9db40118dd4ea531a7e3462c5053b9778e14af2ca48e000d68a16f50faf67f3</citedby><cites>FETCH-LOGICAL-c293t-fb9db40118dd4ea531a7e3462c5053b9778e14af2ca48e000d68a16f50faf67f3</cites><orcidid>0000-0001-7716-1214</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9080096$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9080096$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Zhang, Yucheng</creatorcontrib><creatorcontrib>Yuan, Ye</creatorcontrib><creatorcontrib>Feng, Dan</creatorcontrib><creatorcontrib>Wang, Chunzhi</creatorcontrib><creatorcontrib>Wu, Xinyun</creatorcontrib><creatorcontrib>Yan, Lingyu</creatorcontrib><creatorcontrib>Pan, Deng</creatorcontrib><creatorcontrib>Wang, Shuanghong</creatorcontrib><title>Improving Restore Performance for In-Line Backup System Combining Deduplication and Delta Compression</title><title>IEEE transactions on parallel and distributed systems</title><addtitle>TPDS</addtitle><description>Data deduplication, though being efficient in removing duplicate chunks, introduces chunk fragmentation which decreases restore performance. Rewriting algorithms are proposed to reduce the chunk fragmentation. Delta compression is often used as a complement for data deduplication to further improve storage efficiency. We observe that delta compression introduces a new type of chunk fragmentation stemming from improper delta compression for chunks of which the base chunks are fragmented. The new type of chunk fragmentation severely decreases restore performance and cannot be addressed by existing rewriting algorithms. To address this problem, we propose SDC, a scheme performing post-deduplication delta compression only for the chunks of which the bases can be directly found in the restore cache to eliminate additional disk reads for base chunks, thus avoiding the new type of chunk fragmentation. In addition, self-referenced chunks can be fragmented, which decrease restore performance, and these fragmented chunks can serve as bases to decrease the restore performance repeatedly. We propose a hybrid rewriting scheme for SDC to rewrite such fragmented chunks. Experimental results show that SDC improves the restore performance of the approach that directly performs delta compression after data deduplication by 2.9-16.9x, and achieves more than 95 percent of its compression gains.</description><subject>Acceleration</subject><subject>Algorithms</subject><subject>chunk fragmentation</subject><subject>Computer science</subject><subject>Containers</subject><subject>Data compression</subject><subject>Data deduplication</subject><subject>Data storage systems</subject><subject>delta compression</subject><subject>Fragmentation</subject><subject>Indexes</subject><subject>Measurement</subject><subject>Redundancy</subject><subject>restore performance</subject><subject>storage system</subject><issn>1045-9219</issn><issn>1558-2183</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kF9LwzAUxYMoOKcfQHwp-Nx5kyZt8qibfwYDh5vPJW1vpHNNa9IK-_ambPh0D5ffuZdzCLmlMKMU1MN2vdjMGDCYMaUoJHBGJlQIGTMqk_OggYtYMaouyZX3OwDKBfAJwWXTufa3tl_RB_q-dRit0ZnWNdqWGAURLW28qi1GT7r8Hrpoc_A9NtG8bYrajr4FVkO3r0vd162NtK3CZt_rkegceh-21-TC6L3Hm9Ocks-X5-38LV69vy7nj6u4ZCrpY1OoquBAqawqjlokVGeY8JSVAkRSqCyTSLk2rNRcIgBUqdQ0NQKMNmlmkim5P94NmX6GkCfftYOz4WXOODCupMxEoOiRKl3rvUOTd65utDvkFPKxzXxsMx_bzE9tBs_d0VMj4j-vQAKoNPkDxtBxVg</recordid><startdate>20201001</startdate><enddate>20201001</enddate><creator>Zhang, Yucheng</creator><creator>Yuan, Ye</creator><creator>Feng, Dan</creator><creator>Wang, Chunzhi</creator><creator>Wu, Xinyun</creator><creator>Yan, Lingyu</creator><creator>Pan, Deng</creator><creator>Wang, Shuanghong</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0001-7716-1214</orcidid></search><sort><creationdate>20201001</creationdate><title>Improving Restore Performance for In-Line Backup System Combining Deduplication and Delta Compression</title><author>Zhang, Yucheng ; Yuan, Ye ; Feng, Dan ; Wang, Chunzhi ; Wu, Xinyun ; Yan, Lingyu ; Pan, Deng ; Wang, Shuanghong</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c293t-fb9db40118dd4ea531a7e3462c5053b9778e14af2ca48e000d68a16f50faf67f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Acceleration</topic><topic>Algorithms</topic><topic>chunk fragmentation</topic><topic>Computer science</topic><topic>Containers</topic><topic>Data compression</topic><topic>Data deduplication</topic><topic>Data storage systems</topic><topic>delta compression</topic><topic>Fragmentation</topic><topic>Indexes</topic><topic>Measurement</topic><topic>Redundancy</topic><topic>restore performance</topic><topic>storage system</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zhang, Yucheng</creatorcontrib><creatorcontrib>Yuan, Ye</creatorcontrib><creatorcontrib>Feng, Dan</creatorcontrib><creatorcontrib>Wang, Chunzhi</creatorcontrib><creatorcontrib>Wu, Xinyun</creatorcontrib><creatorcontrib>Yan, Lingyu</creatorcontrib><creatorcontrib>Pan, Deng</creatorcontrib><creatorcontrib>Wang, Shuanghong</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005–Present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998–Present</collection><collection>IEEE</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on parallel and distributed systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Zhang, Yucheng</au><au>Yuan, Ye</au><au>Feng, Dan</au><au>Wang, Chunzhi</au><au>Wu, Xinyun</au><au>Yan, Lingyu</au><au>Pan, Deng</au><au>Wang, Shuanghong</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Improving Restore Performance for In-Line Backup System Combining Deduplication and Delta Compression</atitle><jtitle>IEEE transactions on parallel and distributed systems</jtitle><stitle>TPDS</stitle><date>2020-10-01</date><risdate>2020</risdate><volume>31</volume><issue>10</issue><spage>2302</spage><epage>2314</epage><pages>2302-2314</pages><issn>1045-9219</issn><eissn>1558-2183</eissn><coden>ITDSEO</coden><abstract>Data deduplication, though being efficient in removing duplicate chunks, introduces chunk fragmentation which decreases restore performance. Rewriting algorithms are proposed to reduce the chunk fragmentation. Delta compression is often used as a complement for data deduplication to further improve storage efficiency. We observe that delta compression introduces a new type of chunk fragmentation stemming from improper delta compression for chunks of which the base chunks are fragmented. The new type of chunk fragmentation severely decreases restore performance and cannot be addressed by existing rewriting algorithms. To address this problem, we propose SDC, a scheme performing post-deduplication delta compression only for the chunks of which the bases can be directly found in the restore cache to eliminate additional disk reads for base chunks, thus avoiding the new type of chunk fragmentation. In addition, self-referenced chunks can be fragmented, which decrease restore performance, and these fragmented chunks can serve as bases to decrease the restore performance repeatedly. We propose a hybrid rewriting scheme for SDC to rewrite such fragmented chunks. Experimental results show that SDC improves the restore performance of the approach that directly performs delta compression after data deduplication by 2.9-16.9x, and achieves more than 95 percent of its compression gains.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TPDS.2020.2991030</doi><tpages>13</tpages><orcidid>https://orcid.org/0000-0001-7716-1214</orcidid></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1045-9219
ispartof IEEE transactions on parallel and distributed systems, 2020-10, Vol.31 (10), p.2302-2314
issn 1045-9219
1558-2183
language eng
recordid cdi_proquest_journals_2402498875
source IEEE
subjects Acceleration
Algorithms
chunk fragmentation
Computer science
Containers
Data compression
Data deduplication
Data storage systems
delta compression
Fragmentation
Indexes
Measurement
Redundancy
restore performance
storage system
title Improving Restore Performance for In-Line Backup System Combining Deduplication and Delta Compression
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T18%3A28%3A02IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Improving%20Restore%20Performance%20for%20In-Line%20Backup%20System%20Combining%20Deduplication%20and%20Delta%20Compression&rft.jtitle=IEEE%20transactions%20on%20parallel%20and%20distributed%20systems&rft.au=Zhang,%20Yucheng&rft.date=2020-10-01&rft.volume=31&rft.issue=10&rft.spage=2302&rft.epage=2314&rft.pages=2302-2314&rft.issn=1045-9219&rft.eissn=1558-2183&rft.coden=ITDSEO&rft_id=info:doi/10.1109/TPDS.2020.2991030&rft_dat=%3Cproquest_RIE%3E2402498875%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2402498875&rft_id=info:pmid/&rft_ieee_id=9080096&rfr_iscdi=true