Improving Restore Performance for In-Line Backup System Combining Deduplication and Delta Compression
Data deduplication, though being efficient in removing duplicate chunks, introduces chunk fragmentation which decreases restore performance. Rewriting algorithms are proposed to reduce the chunk fragmentation. Delta compression is often used as a complement for data deduplication to further improve...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on parallel and distributed systems 2020-10, Vol.31 (10), p.2302-2314 |
---|---|
Hauptverfasser: | , , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 2314 |
---|---|
container_issue | 10 |
container_start_page | 2302 |
container_title | IEEE transactions on parallel and distributed systems |
container_volume | 31 |
creator | Zhang, Yucheng Yuan, Ye Feng, Dan Wang, Chunzhi Wu, Xinyun Yan, Lingyu Pan, Deng Wang, Shuanghong |
description | Data deduplication, though being efficient in removing duplicate chunks, introduces chunk fragmentation which decreases restore performance. Rewriting algorithms are proposed to reduce the chunk fragmentation. Delta compression is often used as a complement for data deduplication to further improve storage efficiency. We observe that delta compression introduces a new type of chunk fragmentation stemming from improper delta compression for chunks of which the base chunks are fragmented. The new type of chunk fragmentation severely decreases restore performance and cannot be addressed by existing rewriting algorithms. To address this problem, we propose SDC, a scheme performing post-deduplication delta compression only for the chunks of which the bases can be directly found in the restore cache to eliminate additional disk reads for base chunks, thus avoiding the new type of chunk fragmentation. In addition, self-referenced chunks can be fragmented, which decrease restore performance, and these fragmented chunks can serve as bases to decrease the restore performance repeatedly. We propose a hybrid rewriting scheme for SDC to rewrite such fragmented chunks. Experimental results show that SDC improves the restore performance of the approach that directly performs delta compression after data deduplication by 2.9-16.9x, and achieves more than 95 percent of its compression gains. |
doi_str_mv | 10.1109/TPDS.2020.2991030 |
format | Article |
fullrecord | <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_2402498875</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9080096</ieee_id><sourcerecordid>2402498875</sourcerecordid><originalsourceid>FETCH-LOGICAL-c293t-fb9db40118dd4ea531a7e3462c5053b9778e14af2ca48e000d68a16f50faf67f3</originalsourceid><addsrcrecordid>eNo9kF9LwzAUxYMoOKcfQHwp-Nx5kyZt8qibfwYDh5vPJW1vpHNNa9IK-_ambPh0D5ffuZdzCLmlMKMU1MN2vdjMGDCYMaUoJHBGJlQIGTMqk_OggYtYMaouyZX3OwDKBfAJwWXTufa3tl_RB_q-dRit0ZnWNdqWGAURLW28qi1GT7r8Hrpoc_A9NtG8bYrajr4FVkO3r0vd162NtK3CZt_rkegceh-21-TC6L3Hm9Ocks-X5-38LV69vy7nj6u4ZCrpY1OoquBAqawqjlokVGeY8JSVAkRSqCyTSLk2rNRcIgBUqdQ0NQKMNmlmkim5P94NmX6GkCfftYOz4WXOODCupMxEoOiRKl3rvUOTd65utDvkFPKxzXxsMx_bzE9tBs_d0VMj4j-vQAKoNPkDxtBxVg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2402498875</pqid></control><display><type>article</type><title>Improving Restore Performance for In-Line Backup System Combining Deduplication and Delta Compression</title><source>IEEE</source><creator>Zhang, Yucheng ; Yuan, Ye ; Feng, Dan ; Wang, Chunzhi ; Wu, Xinyun ; Yan, Lingyu ; Pan, Deng ; Wang, Shuanghong</creator><creatorcontrib>Zhang, Yucheng ; Yuan, Ye ; Feng, Dan ; Wang, Chunzhi ; Wu, Xinyun ; Yan, Lingyu ; Pan, Deng ; Wang, Shuanghong</creatorcontrib><description>Data deduplication, though being efficient in removing duplicate chunks, introduces chunk fragmentation which decreases restore performance. Rewriting algorithms are proposed to reduce the chunk fragmentation. Delta compression is often used as a complement for data deduplication to further improve storage efficiency. We observe that delta compression introduces a new type of chunk fragmentation stemming from improper delta compression for chunks of which the base chunks are fragmented. The new type of chunk fragmentation severely decreases restore performance and cannot be addressed by existing rewriting algorithms. To address this problem, we propose SDC, a scheme performing post-deduplication delta compression only for the chunks of which the bases can be directly found in the restore cache to eliminate additional disk reads for base chunks, thus avoiding the new type of chunk fragmentation. In addition, self-referenced chunks can be fragmented, which decrease restore performance, and these fragmented chunks can serve as bases to decrease the restore performance repeatedly. We propose a hybrid rewriting scheme for SDC to rewrite such fragmented chunks. Experimental results show that SDC improves the restore performance of the approach that directly performs delta compression after data deduplication by 2.9-16.9x, and achieves more than 95 percent of its compression gains.</description><identifier>ISSN: 1045-9219</identifier><identifier>EISSN: 1558-2183</identifier><identifier>DOI: 10.1109/TPDS.2020.2991030</identifier><identifier>CODEN: ITDSEO</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Acceleration ; Algorithms ; chunk fragmentation ; Computer science ; Containers ; Data compression ; Data deduplication ; Data storage systems ; delta compression ; Fragmentation ; Indexes ; Measurement ; Redundancy ; restore performance ; storage system</subject><ispartof>IEEE transactions on parallel and distributed systems, 2020-10, Vol.31 (10), p.2302-2314</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c293t-fb9db40118dd4ea531a7e3462c5053b9778e14af2ca48e000d68a16f50faf67f3</citedby><cites>FETCH-LOGICAL-c293t-fb9db40118dd4ea531a7e3462c5053b9778e14af2ca48e000d68a16f50faf67f3</cites><orcidid>0000-0001-7716-1214</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9080096$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9080096$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Zhang, Yucheng</creatorcontrib><creatorcontrib>Yuan, Ye</creatorcontrib><creatorcontrib>Feng, Dan</creatorcontrib><creatorcontrib>Wang, Chunzhi</creatorcontrib><creatorcontrib>Wu, Xinyun</creatorcontrib><creatorcontrib>Yan, Lingyu</creatorcontrib><creatorcontrib>Pan, Deng</creatorcontrib><creatorcontrib>Wang, Shuanghong</creatorcontrib><title>Improving Restore Performance for In-Line Backup System Combining Deduplication and Delta Compression</title><title>IEEE transactions on parallel and distributed systems</title><addtitle>TPDS</addtitle><description>Data deduplication, though being efficient in removing duplicate chunks, introduces chunk fragmentation which decreases restore performance. Rewriting algorithms are proposed to reduce the chunk fragmentation. Delta compression is often used as a complement for data deduplication to further improve storage efficiency. We observe that delta compression introduces a new type of chunk fragmentation stemming from improper delta compression for chunks of which the base chunks are fragmented. The new type of chunk fragmentation severely decreases restore performance and cannot be addressed by existing rewriting algorithms. To address this problem, we propose SDC, a scheme performing post-deduplication delta compression only for the chunks of which the bases can be directly found in the restore cache to eliminate additional disk reads for base chunks, thus avoiding the new type of chunk fragmentation. In addition, self-referenced chunks can be fragmented, which decrease restore performance, and these fragmented chunks can serve as bases to decrease the restore performance repeatedly. We propose a hybrid rewriting scheme for SDC to rewrite such fragmented chunks. Experimental results show that SDC improves the restore performance of the approach that directly performs delta compression after data deduplication by 2.9-16.9x, and achieves more than 95 percent of its compression gains.</description><subject>Acceleration</subject><subject>Algorithms</subject><subject>chunk fragmentation</subject><subject>Computer science</subject><subject>Containers</subject><subject>Data compression</subject><subject>Data deduplication</subject><subject>Data storage systems</subject><subject>delta compression</subject><subject>Fragmentation</subject><subject>Indexes</subject><subject>Measurement</subject><subject>Redundancy</subject><subject>restore performance</subject><subject>storage system</subject><issn>1045-9219</issn><issn>1558-2183</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kF9LwzAUxYMoOKcfQHwp-Nx5kyZt8qibfwYDh5vPJW1vpHNNa9IK-_ambPh0D5ffuZdzCLmlMKMU1MN2vdjMGDCYMaUoJHBGJlQIGTMqk_OggYtYMaouyZX3OwDKBfAJwWXTufa3tl_RB_q-dRit0ZnWNdqWGAURLW28qi1GT7r8Hrpoc_A9NtG8bYrajr4FVkO3r0vd162NtK3CZt_rkegceh-21-TC6L3Hm9Ocks-X5-38LV69vy7nj6u4ZCrpY1OoquBAqawqjlokVGeY8JSVAkRSqCyTSLk2rNRcIgBUqdQ0NQKMNmlmkim5P94NmX6GkCfftYOz4WXOODCupMxEoOiRKl3rvUOTd65utDvkFPKxzXxsMx_bzE9tBs_d0VMj4j-vQAKoNPkDxtBxVg</recordid><startdate>20201001</startdate><enddate>20201001</enddate><creator>Zhang, Yucheng</creator><creator>Yuan, Ye</creator><creator>Feng, Dan</creator><creator>Wang, Chunzhi</creator><creator>Wu, Xinyun</creator><creator>Yan, Lingyu</creator><creator>Pan, Deng</creator><creator>Wang, Shuanghong</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0001-7716-1214</orcidid></search><sort><creationdate>20201001</creationdate><title>Improving Restore Performance for In-Line Backup System Combining Deduplication and Delta Compression</title><author>Zhang, Yucheng ; Yuan, Ye ; Feng, Dan ; Wang, Chunzhi ; Wu, Xinyun ; Yan, Lingyu ; Pan, Deng ; Wang, Shuanghong</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c293t-fb9db40118dd4ea531a7e3462c5053b9778e14af2ca48e000d68a16f50faf67f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Acceleration</topic><topic>Algorithms</topic><topic>chunk fragmentation</topic><topic>Computer science</topic><topic>Containers</topic><topic>Data compression</topic><topic>Data deduplication</topic><topic>Data storage systems</topic><topic>delta compression</topic><topic>Fragmentation</topic><topic>Indexes</topic><topic>Measurement</topic><topic>Redundancy</topic><topic>restore performance</topic><topic>storage system</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zhang, Yucheng</creatorcontrib><creatorcontrib>Yuan, Ye</creatorcontrib><creatorcontrib>Feng, Dan</creatorcontrib><creatorcontrib>Wang, Chunzhi</creatorcontrib><creatorcontrib>Wu, Xinyun</creatorcontrib><creatorcontrib>Yan, Lingyu</creatorcontrib><creatorcontrib>Pan, Deng</creatorcontrib><creatorcontrib>Wang, Shuanghong</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005–Present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998–Present</collection><collection>IEEE</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on parallel and distributed systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Zhang, Yucheng</au><au>Yuan, Ye</au><au>Feng, Dan</au><au>Wang, Chunzhi</au><au>Wu, Xinyun</au><au>Yan, Lingyu</au><au>Pan, Deng</au><au>Wang, Shuanghong</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Improving Restore Performance for In-Line Backup System Combining Deduplication and Delta Compression</atitle><jtitle>IEEE transactions on parallel and distributed systems</jtitle><stitle>TPDS</stitle><date>2020-10-01</date><risdate>2020</risdate><volume>31</volume><issue>10</issue><spage>2302</spage><epage>2314</epage><pages>2302-2314</pages><issn>1045-9219</issn><eissn>1558-2183</eissn><coden>ITDSEO</coden><abstract>Data deduplication, though being efficient in removing duplicate chunks, introduces chunk fragmentation which decreases restore performance. Rewriting algorithms are proposed to reduce the chunk fragmentation. Delta compression is often used as a complement for data deduplication to further improve storage efficiency. We observe that delta compression introduces a new type of chunk fragmentation stemming from improper delta compression for chunks of which the base chunks are fragmented. The new type of chunk fragmentation severely decreases restore performance and cannot be addressed by existing rewriting algorithms. To address this problem, we propose SDC, a scheme performing post-deduplication delta compression only for the chunks of which the bases can be directly found in the restore cache to eliminate additional disk reads for base chunks, thus avoiding the new type of chunk fragmentation. In addition, self-referenced chunks can be fragmented, which decrease restore performance, and these fragmented chunks can serve as bases to decrease the restore performance repeatedly. We propose a hybrid rewriting scheme for SDC to rewrite such fragmented chunks. Experimental results show that SDC improves the restore performance of the approach that directly performs delta compression after data deduplication by 2.9-16.9x, and achieves more than 95 percent of its compression gains.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TPDS.2020.2991030</doi><tpages>13</tpages><orcidid>https://orcid.org/0000-0001-7716-1214</orcidid></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 1045-9219 |
ispartof | IEEE transactions on parallel and distributed systems, 2020-10, Vol.31 (10), p.2302-2314 |
issn | 1045-9219 1558-2183 |
language | eng |
recordid | cdi_proquest_journals_2402498875 |
source | IEEE |
subjects | Acceleration Algorithms chunk fragmentation Computer science Containers Data compression Data deduplication Data storage systems delta compression Fragmentation Indexes Measurement Redundancy restore performance storage system |
title | Improving Restore Performance for In-Line Backup System Combining Deduplication and Delta Compression |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T18%3A28%3A02IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Improving%20Restore%20Performance%20for%20In-Line%20Backup%20System%20Combining%20Deduplication%20and%20Delta%20Compression&rft.jtitle=IEEE%20transactions%20on%20parallel%20and%20distributed%20systems&rft.au=Zhang,%20Yucheng&rft.date=2020-10-01&rft.volume=31&rft.issue=10&rft.spage=2302&rft.epage=2314&rft.pages=2302-2314&rft.issn=1045-9219&rft.eissn=1558-2183&rft.coden=ITDSEO&rft_id=info:doi/10.1109/TPDS.2020.2991030&rft_dat=%3Cproquest_RIE%3E2402498875%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2402498875&rft_id=info:pmid/&rft_ieee_id=9080096&rfr_iscdi=true |