LDPC Code Design for Distributed Storage: Balancing Repair Bandwidth, Reliability, and Storage Overhead

Distributed storage systems suffer from significant repair traffic generated due to the frequent storage node failures. This paper shows that properly designed low-density parity-check (LDPC) codes can substantially reduce the amount of required block downloads for repair thanks to the sparse nature...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on communications 2018-02, Vol.66 (2), p.507-520
Hauptverfasser: Park, Hyegyeong, Lee, Dongwon, Moon, Jaekyun
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 520
container_issue 2
container_start_page 507
container_title IEEE transactions on communications
container_volume 66
creator Park, Hyegyeong
Lee, Dongwon
Moon, Jaekyun
description Distributed storage systems suffer from significant repair traffic generated due to the frequent storage node failures. This paper shows that properly designed low-density parity-check (LDPC) codes can substantially reduce the amount of required block downloads for repair thanks to the sparse nature of their factor graph representation. In particular, with a careful construction of the factor graph, both low repair-bandwidth and high reliability can be achieved for a given code rate. First, a formula for the average repair bandwidth of LDPC codes is developed. This formula is then used to establish that the minimum repair bandwidth can be achieved by forcing a regular check node degree in the factor graph. Moreover, it is shown that given a fixed code rate, the variable node degree should also be regular to yield minimum repair bandwidth, under some reasonable minimum variable node degree constraint. It is also shown that for a given repair-bandwidth requirement, LDPC codes can yield substantially higher reliability than the currently utilized Reed-Solomon codes. Our reliability analysis is based on a formulation of the general equation for the mean-time-to-data-loss (MTTDL) associated with LDPC codes. The formulation reveals that the stopping number is closely related to the MTTDL. It is further shown that LDPC codes can be designed such that a small loss of repair-bandwidth optimality may be traded for a large improvement in erasure-correction capability and thus the MTTDL.
doi_str_mv 10.1109/TCOMM.2017.2769116
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_ieee_primary_8094003</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>8094003</ieee_id><sourcerecordid>2174490403</sourcerecordid><originalsourceid>FETCH-LOGICAL-c295t-fd3a5dd31851d1e2b30125a8ae6e5f80b4115ab222dc4a72fb7c6ea57fb62da93</originalsourceid><addsrcrecordid>eNo9kMtOwzAQRS0EEqXwA7CxxLYpYyeOE3aQ8pJaFUFZW048SV2VpNgpqH9P-oDVSFdz7mgOIZcMhoxBejPLppPJkAOTQy7jlLH4iPSYEEkAiZDHpAeQQhBLmZySM-8XABBBGPZINR69ZjRrDNIRelvVtGwcHVnfOpuvWzT0vW2crvCW3uulrgtbV_QNV9q6LqjNjzXtfNAlS6tzu7TtZkC7-I-i0290c9TmnJyUeunx4jD75OPxYZY9B-Pp00t2Nw4Knoo2KE2ohTEhSwQzDHkeAuNCJxpjFGUCecSY0Dnn3BSRlrzMZRGjFrLMY250GvbJ9b535ZqvNfpWLZq1q7uTijMZRenu7z7h-63CNd47LNXK2U_tNoqB2gpVO6FqK1QdhHbQ1R6yiPgPJJBG0FX-Al_LcbY</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2174490403</pqid></control><display><type>article</type><title>LDPC Code Design for Distributed Storage: Balancing Repair Bandwidth, Reliability, and Storage Overhead</title><source>IEEE Electronic Library (IEL)</source><creator>Park, Hyegyeong ; Lee, Dongwon ; Moon, Jaekyun</creator><creatorcontrib>Park, Hyegyeong ; Lee, Dongwon ; Moon, Jaekyun</creatorcontrib><description>Distributed storage systems suffer from significant repair traffic generated due to the frequent storage node failures. This paper shows that properly designed low-density parity-check (LDPC) codes can substantially reduce the amount of required block downloads for repair thanks to the sparse nature of their factor graph representation. In particular, with a careful construction of the factor graph, both low repair-bandwidth and high reliability can be achieved for a given code rate. First, a formula for the average repair bandwidth of LDPC codes is developed. This formula is then used to establish that the minimum repair bandwidth can be achieved by forcing a regular check node degree in the factor graph. Moreover, it is shown that given a fixed code rate, the variable node degree should also be regular to yield minimum repair bandwidth, under some reasonable minimum variable node degree constraint. It is also shown that for a given repair-bandwidth requirement, LDPC codes can yield substantially higher reliability than the currently utilized Reed-Solomon codes. Our reliability analysis is based on a formulation of the general equation for the mean-time-to-data-loss (MTTDL) associated with LDPC codes. The formulation reveals that the stopping number is closely related to the MTTDL. It is further shown that LDPC codes can be designed such that a small loss of repair-bandwidth optimality may be traded for a large improvement in erasure-correction capability and thus the MTTDL.</description><identifier>ISSN: 0090-6778</identifier><identifier>EISSN: 1558-0857</identifier><identifier>DOI: 10.1109/TCOMM.2017.2769116</identifier><identifier>CODEN: IECMBT</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Bandwidth ; Binary system ; Codes ; Distributed storage ; Encoding ; Error correcting codes ; Facebook ; factor graph ; Graph representations ; Graphical representations ; Low density parity check codes ; low-density parity-check (LDPC) codes ; Maintenance engineering ; mean-time-to-data-loss (MTTDL) ; Nodes ; Parity check codes ; Reed-Solomon codes ; Reliability analysis ; Reliability aspects ; Reliability engineering ; Repair ; repair bandwidth ; Storage systems</subject><ispartof>IEEE transactions on communications, 2018-02, Vol.66 (2), p.507-520</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2018</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c295t-fd3a5dd31851d1e2b30125a8ae6e5f80b4115ab222dc4a72fb7c6ea57fb62da93</citedby><cites>FETCH-LOGICAL-c295t-fd3a5dd31851d1e2b30125a8ae6e5f80b4115ab222dc4a72fb7c6ea57fb62da93</cites><orcidid>0000-0003-3686-8891 ; 0000-0003-0993-5788</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/8094003$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/8094003$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Park, Hyegyeong</creatorcontrib><creatorcontrib>Lee, Dongwon</creatorcontrib><creatorcontrib>Moon, Jaekyun</creatorcontrib><title>LDPC Code Design for Distributed Storage: Balancing Repair Bandwidth, Reliability, and Storage Overhead</title><title>IEEE transactions on communications</title><addtitle>TCOMM</addtitle><description>Distributed storage systems suffer from significant repair traffic generated due to the frequent storage node failures. This paper shows that properly designed low-density parity-check (LDPC) codes can substantially reduce the amount of required block downloads for repair thanks to the sparse nature of their factor graph representation. In particular, with a careful construction of the factor graph, both low repair-bandwidth and high reliability can be achieved for a given code rate. First, a formula for the average repair bandwidth of LDPC codes is developed. This formula is then used to establish that the minimum repair bandwidth can be achieved by forcing a regular check node degree in the factor graph. Moreover, it is shown that given a fixed code rate, the variable node degree should also be regular to yield minimum repair bandwidth, under some reasonable minimum variable node degree constraint. It is also shown that for a given repair-bandwidth requirement, LDPC codes can yield substantially higher reliability than the currently utilized Reed-Solomon codes. Our reliability analysis is based on a formulation of the general equation for the mean-time-to-data-loss (MTTDL) associated with LDPC codes. The formulation reveals that the stopping number is closely related to the MTTDL. It is further shown that LDPC codes can be designed such that a small loss of repair-bandwidth optimality may be traded for a large improvement in erasure-correction capability and thus the MTTDL.</description><subject>Bandwidth</subject><subject>Binary system</subject><subject>Codes</subject><subject>Distributed storage</subject><subject>Encoding</subject><subject>Error correcting codes</subject><subject>Facebook</subject><subject>factor graph</subject><subject>Graph representations</subject><subject>Graphical representations</subject><subject>Low density parity check codes</subject><subject>low-density parity-check (LDPC) codes</subject><subject>Maintenance engineering</subject><subject>mean-time-to-data-loss (MTTDL)</subject><subject>Nodes</subject><subject>Parity check codes</subject><subject>Reed-Solomon codes</subject><subject>Reliability analysis</subject><subject>Reliability aspects</subject><subject>Reliability engineering</subject><subject>Repair</subject><subject>repair bandwidth</subject><subject>Storage systems</subject><issn>0090-6778</issn><issn>1558-0857</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kMtOwzAQRS0EEqXwA7CxxLYpYyeOE3aQ8pJaFUFZW048SV2VpNgpqH9P-oDVSFdz7mgOIZcMhoxBejPLppPJkAOTQy7jlLH4iPSYEEkAiZDHpAeQQhBLmZySM-8XABBBGPZINR69ZjRrDNIRelvVtGwcHVnfOpuvWzT0vW2crvCW3uulrgtbV_QNV9q6LqjNjzXtfNAlS6tzu7TtZkC7-I-i0290c9TmnJyUeunx4jD75OPxYZY9B-Pp00t2Nw4Knoo2KE2ohTEhSwQzDHkeAuNCJxpjFGUCecSY0Dnn3BSRlrzMZRGjFrLMY250GvbJ9b535ZqvNfpWLZq1q7uTijMZRenu7z7h-63CNd47LNXK2U_tNoqB2gpVO6FqK1QdhHbQ1R6yiPgPJJBG0FX-Al_LcbY</recordid><startdate>20180201</startdate><enddate>20180201</enddate><creator>Park, Hyegyeong</creator><creator>Lee, Dongwon</creator><creator>Moon, Jaekyun</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SP</scope><scope>8FD</scope><scope>L7M</scope><orcidid>https://orcid.org/0000-0003-3686-8891</orcidid><orcidid>https://orcid.org/0000-0003-0993-5788</orcidid></search><sort><creationdate>20180201</creationdate><title>LDPC Code Design for Distributed Storage: Balancing Repair Bandwidth, Reliability, and Storage Overhead</title><author>Park, Hyegyeong ; Lee, Dongwon ; Moon, Jaekyun</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c295t-fd3a5dd31851d1e2b30125a8ae6e5f80b4115ab222dc4a72fb7c6ea57fb62da93</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Bandwidth</topic><topic>Binary system</topic><topic>Codes</topic><topic>Distributed storage</topic><topic>Encoding</topic><topic>Error correcting codes</topic><topic>Facebook</topic><topic>factor graph</topic><topic>Graph representations</topic><topic>Graphical representations</topic><topic>Low density parity check codes</topic><topic>low-density parity-check (LDPC) codes</topic><topic>Maintenance engineering</topic><topic>mean-time-to-data-loss (MTTDL)</topic><topic>Nodes</topic><topic>Parity check codes</topic><topic>Reed-Solomon codes</topic><topic>Reliability analysis</topic><topic>Reliability aspects</topic><topic>Reliability engineering</topic><topic>Repair</topic><topic>repair bandwidth</topic><topic>Storage systems</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Park, Hyegyeong</creatorcontrib><creatorcontrib>Lee, Dongwon</creatorcontrib><creatorcontrib>Moon, Jaekyun</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>Advanced Technologies Database with Aerospace</collection><jtitle>IEEE transactions on communications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Park, Hyegyeong</au><au>Lee, Dongwon</au><au>Moon, Jaekyun</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>LDPC Code Design for Distributed Storage: Balancing Repair Bandwidth, Reliability, and Storage Overhead</atitle><jtitle>IEEE transactions on communications</jtitle><stitle>TCOMM</stitle><date>2018-02-01</date><risdate>2018</risdate><volume>66</volume><issue>2</issue><spage>507</spage><epage>520</epage><pages>507-520</pages><issn>0090-6778</issn><eissn>1558-0857</eissn><coden>IECMBT</coden><abstract>Distributed storage systems suffer from significant repair traffic generated due to the frequent storage node failures. This paper shows that properly designed low-density parity-check (LDPC) codes can substantially reduce the amount of required block downloads for repair thanks to the sparse nature of their factor graph representation. In particular, with a careful construction of the factor graph, both low repair-bandwidth and high reliability can be achieved for a given code rate. First, a formula for the average repair bandwidth of LDPC codes is developed. This formula is then used to establish that the minimum repair bandwidth can be achieved by forcing a regular check node degree in the factor graph. Moreover, it is shown that given a fixed code rate, the variable node degree should also be regular to yield minimum repair bandwidth, under some reasonable minimum variable node degree constraint. It is also shown that for a given repair-bandwidth requirement, LDPC codes can yield substantially higher reliability than the currently utilized Reed-Solomon codes. Our reliability analysis is based on a formulation of the general equation for the mean-time-to-data-loss (MTTDL) associated with LDPC codes. The formulation reveals that the stopping number is closely related to the MTTDL. It is further shown that LDPC codes can be designed such that a small loss of repair-bandwidth optimality may be traded for a large improvement in erasure-correction capability and thus the MTTDL.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TCOMM.2017.2769116</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0003-3686-8891</orcidid><orcidid>https://orcid.org/0000-0003-0993-5788</orcidid></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 0090-6778
ispartof IEEE transactions on communications, 2018-02, Vol.66 (2), p.507-520
issn 0090-6778
1558-0857
language eng
recordid cdi_ieee_primary_8094003
source IEEE Electronic Library (IEL)
subjects Bandwidth
Binary system
Codes
Distributed storage
Encoding
Error correcting codes
Facebook
factor graph
Graph representations
Graphical representations
Low density parity check codes
low-density parity-check (LDPC) codes
Maintenance engineering
mean-time-to-data-loss (MTTDL)
Nodes
Parity check codes
Reed-Solomon codes
Reliability analysis
Reliability aspects
Reliability engineering
Repair
repair bandwidth
Storage systems
title LDPC Code Design for Distributed Storage: Balancing Repair Bandwidth, Reliability, and Storage Overhead
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-13T07%3A45%3A03IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=LDPC%20Code%20Design%20for%20Distributed%20Storage:%20Balancing%20Repair%20Bandwidth,%20Reliability,%20and%20Storage%20Overhead&rft.jtitle=IEEE%20transactions%20on%20communications&rft.au=Park,%20Hyegyeong&rft.date=2018-02-01&rft.volume=66&rft.issue=2&rft.spage=507&rft.epage=520&rft.pages=507-520&rft.issn=0090-6778&rft.eissn=1558-0857&rft.coden=IECMBT&rft_id=info:doi/10.1109/TCOMM.2017.2769116&rft_dat=%3Cproquest_RIE%3E2174490403%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2174490403&rft_id=info:pmid/&rft_ieee_id=8094003&rfr_iscdi=true