Cluster-Aware Scattered Repair in Erasure-Coded Storage: Design and Analysis

Erasure coding is a storage-efficient means to guarantee data reliability in today's commodity storage systems, yet its repair performance is seriously hindered by the substantial repair traffic. Repair in clustered storage systems is even complicated because of the scarcity of the cross-cluste...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on computers 2021-11, Vol.70 (11), p.1861-1874
Hauptverfasser: Shen, Zhirong, Lin, Shiyao, Shu, Jiwu, Xie, Chengxin, Huang, Zhijie, Fu, Yingxun
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1874
container_issue 11
container_start_page 1861
container_title IEEE transactions on computers
container_volume 70
creator Shen, Zhirong
Lin, Shiyao
Shu, Jiwu
Xie, Chengxin
Huang, Zhijie
Fu, Yingxun
description Erasure coding is a storage-efficient means to guarantee data reliability in today's commodity storage systems, yet its repair performance is seriously hindered by the substantial repair traffic. Repair in clustered storage systems is even complicated because of the scarcity of the cross-cluster bandwidth. We present {\sf ClusterSR} ClusterSR , a cluster-aware scattered repair approach. {\sf ClusterSR} ClusterSR minimizes the cross-cluster repair traffic by carefully choosing the clusters for reading and repairing chunks. It further balances the cross-cluster repair traffic by scheduling the repair of multiple chunks. Large-scale simulation and Alibaba Cloud ECS experiments show that {\sf ClusterSR} ClusterSR can reduce 5.6-52.7 percent of the cross-cluster repair traffic and improve 14.4-68.8 percent of the repair throughput.
doi_str_mv 10.1109/TC.2020.3028353
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_ieee_primary_9210857</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9210857</ieee_id><sourcerecordid>2580099407</sourcerecordid><originalsourceid>FETCH-LOGICAL-c330t-85c20e00775002f8123cd839abe9b52bda6351ba02e47f4d8619c8716bda639b3</originalsourceid><addsrcrecordid>eNo9kN9LwzAQx4MoOKfPPvgS8DnbJWmaxLdR5w8YCG4-h7S9jo7ZzqRF9t-vc8On4-4-3-P4EHLPYcI52OkqmwgQMJEgjFTygoy4UppZq9JLMgLghlmZwDW5iXEDAKkAOyKLbNvHDgOb_fqAdFn4buiwpJ-483WgdUPnwcc-IMvacpgvuzb4NT7RZ4z1uqG-Kems8dt9rOMtuar8NuLduY7J18t8lb2xxcfrezZbsEJK6JhRhQAE0FoBiMpwIYvSSOtztLkSeelTqXjuQWCiq6Q0KbeF0Tz929hcjsnj6e4utD89xs5t2j4MT0QnlAGwNgE9UNMTVYQ2xoCV24X624e94-COytwqc0dl7qxsSDycEjUi_tNWcDBKywNhbmVx</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2580099407</pqid></control><display><type>article</type><title>Cluster-Aware Scattered Repair in Erasure-Coded Storage: Design and Analysis</title><source>IEEE Electronic Library (IEL)</source><creator>Shen, Zhirong ; Lin, Shiyao ; Shu, Jiwu ; Xie, Chengxin ; Huang, Zhijie ; Fu, Yingxun</creator><creatorcontrib>Shen, Zhirong ; Lin, Shiyao ; Shu, Jiwu ; Xie, Chengxin ; Huang, Zhijie ; Fu, Yingxun</creatorcontrib><description><![CDATA[Erasure coding is a storage-efficient means to guarantee data reliability in today's commodity storage systems, yet its repair performance is seriously hindered by the substantial repair traffic. Repair in clustered storage systems is even complicated because of the scarcity of the cross-cluster bandwidth. We present <inline-formula><tex-math notation="LaTeX">{\sf ClusterSR}</tex-math> <mml:math><mml:mi mathvariant="sans-serif">ClusterSR</mml:mi></mml:math><inline-graphic xlink:href="shen-ieq1-3028353.gif"/> </inline-formula>, a cluster-aware scattered repair approach. <inline-formula><tex-math notation="LaTeX">{\sf ClusterSR}</tex-math> <mml:math><mml:mi mathvariant="sans-serif">ClusterSR</mml:mi></mml:math><inline-graphic xlink:href="shen-ieq2-3028353.gif"/> </inline-formula> minimizes the cross-cluster repair traffic by carefully choosing the clusters for reading and repairing chunks. It further balances the cross-cluster repair traffic by scheduling the repair of multiple chunks. Large-scale simulation and Alibaba Cloud ECS experiments show that <inline-formula><tex-math notation="LaTeX">{\sf ClusterSR}</tex-math> <mml:math><mml:mi mathvariant="sans-serif">ClusterSR</mml:mi></mml:math><inline-graphic xlink:href="shen-ieq3-3028353.gif"/> </inline-formula> can reduce 5.6-52.7 percent of the cross-cluster repair traffic and improve 14.4-68.8 percent of the repair throughput.]]></description><identifier>ISSN: 0018-9340</identifier><identifier>EISSN: 1557-9956</identifier><identifier>DOI: 10.1109/TC.2020.3028353</identifier><identifier>CODEN: ITCOB4</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Bandwidth ; Clusters ; Computer architecture ; Cross-cluster repair traffic ; Data centers ; Encoding ; Fault tolerance ; Fault tolerant systems ; full duplex transmission ; load balancing ; Maintenance engineering ; Reliability analysis ; Repair ; scattered repair ; Storage systems</subject><ispartof>IEEE transactions on computers, 2021-11, Vol.70 (11), p.1861-1874</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2021</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c330t-85c20e00775002f8123cd839abe9b52bda6351ba02e47f4d8619c8716bda639b3</citedby><cites>FETCH-LOGICAL-c330t-85c20e00775002f8123cd839abe9b52bda6351ba02e47f4d8619c8716bda639b3</cites><orcidid>0000-0003-2673-5868 ; 0000-0002-5796-7314</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9210857$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27923,27924,54757</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9210857$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Shen, Zhirong</creatorcontrib><creatorcontrib>Lin, Shiyao</creatorcontrib><creatorcontrib>Shu, Jiwu</creatorcontrib><creatorcontrib>Xie, Chengxin</creatorcontrib><creatorcontrib>Huang, Zhijie</creatorcontrib><creatorcontrib>Fu, Yingxun</creatorcontrib><title>Cluster-Aware Scattered Repair in Erasure-Coded Storage: Design and Analysis</title><title>IEEE transactions on computers</title><addtitle>TC</addtitle><description><![CDATA[Erasure coding is a storage-efficient means to guarantee data reliability in today's commodity storage systems, yet its repair performance is seriously hindered by the substantial repair traffic. Repair in clustered storage systems is even complicated because of the scarcity of the cross-cluster bandwidth. We present <inline-formula><tex-math notation="LaTeX">{\sf ClusterSR}</tex-math> <mml:math><mml:mi mathvariant="sans-serif">ClusterSR</mml:mi></mml:math><inline-graphic xlink:href="shen-ieq1-3028353.gif"/> </inline-formula>, a cluster-aware scattered repair approach. <inline-formula><tex-math notation="LaTeX">{\sf ClusterSR}</tex-math> <mml:math><mml:mi mathvariant="sans-serif">ClusterSR</mml:mi></mml:math><inline-graphic xlink:href="shen-ieq2-3028353.gif"/> </inline-formula> minimizes the cross-cluster repair traffic by carefully choosing the clusters for reading and repairing chunks. It further balances the cross-cluster repair traffic by scheduling the repair of multiple chunks. Large-scale simulation and Alibaba Cloud ECS experiments show that <inline-formula><tex-math notation="LaTeX">{\sf ClusterSR}</tex-math> <mml:math><mml:mi mathvariant="sans-serif">ClusterSR</mml:mi></mml:math><inline-graphic xlink:href="shen-ieq3-3028353.gif"/> </inline-formula> can reduce 5.6-52.7 percent of the cross-cluster repair traffic and improve 14.4-68.8 percent of the repair throughput.]]></description><subject>Bandwidth</subject><subject>Clusters</subject><subject>Computer architecture</subject><subject>Cross-cluster repair traffic</subject><subject>Data centers</subject><subject>Encoding</subject><subject>Fault tolerance</subject><subject>Fault tolerant systems</subject><subject>full duplex transmission</subject><subject>load balancing</subject><subject>Maintenance engineering</subject><subject>Reliability analysis</subject><subject>Repair</subject><subject>scattered repair</subject><subject>Storage systems</subject><issn>0018-9340</issn><issn>1557-9956</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kN9LwzAQx4MoOKfPPvgS8DnbJWmaxLdR5w8YCG4-h7S9jo7ZzqRF9t-vc8On4-4-3-P4EHLPYcI52OkqmwgQMJEgjFTygoy4UppZq9JLMgLghlmZwDW5iXEDAKkAOyKLbNvHDgOb_fqAdFn4buiwpJ-483WgdUPnwcc-IMvacpgvuzb4NT7RZ4z1uqG-Kems8dt9rOMtuar8NuLduY7J18t8lb2xxcfrezZbsEJK6JhRhQAE0FoBiMpwIYvSSOtztLkSeelTqXjuQWCiq6Q0KbeF0Tz929hcjsnj6e4utD89xs5t2j4MT0QnlAGwNgE9UNMTVYQ2xoCV24X624e94-COytwqc0dl7qxsSDycEjUi_tNWcDBKywNhbmVx</recordid><startdate>20211101</startdate><enddate>20211101</enddate><creator>Shen, Zhirong</creator><creator>Lin, Shiyao</creator><creator>Shu, Jiwu</creator><creator>Xie, Chengxin</creator><creator>Huang, Zhijie</creator><creator>Fu, Yingxun</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0003-2673-5868</orcidid><orcidid>https://orcid.org/0000-0002-5796-7314</orcidid></search><sort><creationdate>20211101</creationdate><title>Cluster-Aware Scattered Repair in Erasure-Coded Storage: Design and Analysis</title><author>Shen, Zhirong ; Lin, Shiyao ; Shu, Jiwu ; Xie, Chengxin ; Huang, Zhijie ; Fu, Yingxun</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c330t-85c20e00775002f8123cd839abe9b52bda6351ba02e47f4d8619c8716bda639b3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Bandwidth</topic><topic>Clusters</topic><topic>Computer architecture</topic><topic>Cross-cluster repair traffic</topic><topic>Data centers</topic><topic>Encoding</topic><topic>Fault tolerance</topic><topic>Fault tolerant systems</topic><topic>full duplex transmission</topic><topic>load balancing</topic><topic>Maintenance engineering</topic><topic>Reliability analysis</topic><topic>Repair</topic><topic>scattered repair</topic><topic>Storage systems</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Shen, Zhirong</creatorcontrib><creatorcontrib>Lin, Shiyao</creatorcontrib><creatorcontrib>Shu, Jiwu</creatorcontrib><creatorcontrib>Xie, Chengxin</creatorcontrib><creatorcontrib>Huang, Zhijie</creatorcontrib><creatorcontrib>Fu, Yingxun</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on computers</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Shen, Zhirong</au><au>Lin, Shiyao</au><au>Shu, Jiwu</au><au>Xie, Chengxin</au><au>Huang, Zhijie</au><au>Fu, Yingxun</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Cluster-Aware Scattered Repair in Erasure-Coded Storage: Design and Analysis</atitle><jtitle>IEEE transactions on computers</jtitle><stitle>TC</stitle><date>2021-11-01</date><risdate>2021</risdate><volume>70</volume><issue>11</issue><spage>1861</spage><epage>1874</epage><pages>1861-1874</pages><issn>0018-9340</issn><eissn>1557-9956</eissn><coden>ITCOB4</coden><abstract><![CDATA[Erasure coding is a storage-efficient means to guarantee data reliability in today's commodity storage systems, yet its repair performance is seriously hindered by the substantial repair traffic. Repair in clustered storage systems is even complicated because of the scarcity of the cross-cluster bandwidth. We present <inline-formula><tex-math notation="LaTeX">{\sf ClusterSR}</tex-math> <mml:math><mml:mi mathvariant="sans-serif">ClusterSR</mml:mi></mml:math><inline-graphic xlink:href="shen-ieq1-3028353.gif"/> </inline-formula>, a cluster-aware scattered repair approach. <inline-formula><tex-math notation="LaTeX">{\sf ClusterSR}</tex-math> <mml:math><mml:mi mathvariant="sans-serif">ClusterSR</mml:mi></mml:math><inline-graphic xlink:href="shen-ieq2-3028353.gif"/> </inline-formula> minimizes the cross-cluster repair traffic by carefully choosing the clusters for reading and repairing chunks. It further balances the cross-cluster repair traffic by scheduling the repair of multiple chunks. Large-scale simulation and Alibaba Cloud ECS experiments show that <inline-formula><tex-math notation="LaTeX">{\sf ClusterSR}</tex-math> <mml:math><mml:mi mathvariant="sans-serif">ClusterSR</mml:mi></mml:math><inline-graphic xlink:href="shen-ieq3-3028353.gif"/> </inline-formula> can reduce 5.6-52.7 percent of the cross-cluster repair traffic and improve 14.4-68.8 percent of the repair throughput.]]></abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TC.2020.3028353</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0003-2673-5868</orcidid><orcidid>https://orcid.org/0000-0002-5796-7314</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 0018-9340
ispartof IEEE transactions on computers, 2021-11, Vol.70 (11), p.1861-1874
issn 0018-9340
1557-9956
language eng
recordid cdi_ieee_primary_9210857
source IEEE Electronic Library (IEL)
subjects Bandwidth
Clusters
Computer architecture
Cross-cluster repair traffic
Data centers
Encoding
Fault tolerance
Fault tolerant systems
full duplex transmission
load balancing
Maintenance engineering
Reliability analysis
Repair
scattered repair
Storage systems
title Cluster-Aware Scattered Repair in Erasure-Coded Storage: Design and Analysis
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-11T01%3A55%3A55IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Cluster-Aware%20Scattered%20Repair%20in%20Erasure-Coded%20Storage:%20Design%20and%20Analysis&rft.jtitle=IEEE%20transactions%20on%20computers&rft.au=Shen,%20Zhirong&rft.date=2021-11-01&rft.volume=70&rft.issue=11&rft.spage=1861&rft.epage=1874&rft.pages=1861-1874&rft.issn=0018-9340&rft.eissn=1557-9956&rft.coden=ITCOB4&rft_id=info:doi/10.1109/TC.2020.3028353&rft_dat=%3Cproquest_RIE%3E2580099407%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2580099407&rft_id=info:pmid/&rft_ieee_id=9210857&rfr_iscdi=true