LDPC Code Design for Distributed Storage: Balancing Repair Bandwidth, Reliability, and Storage Overhead
Distributed storage systems suffer from significant repair traffic generated due to the frequent storage node failures. This paper shows that properly designed low-density parity-check (LDPC) codes can substantially reduce the amount of required block downloads for repair thanks to the sparse nature...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on communications 2018-02, Vol.66 (2), p.507-520 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 520 |
---|---|
container_issue | 2 |
container_start_page | 507 |
container_title | IEEE transactions on communications |
container_volume | 66 |
creator | Park, Hyegyeong Lee, Dongwon Moon, Jaekyun |
description | Distributed storage systems suffer from significant repair traffic generated due to the frequent storage node failures. This paper shows that properly designed low-density parity-check (LDPC) codes can substantially reduce the amount of required block downloads for repair thanks to the sparse nature of their factor graph representation. In particular, with a careful construction of the factor graph, both low repair-bandwidth and high reliability can be achieved for a given code rate. First, a formula for the average repair bandwidth of LDPC codes is developed. This formula is then used to establish that the minimum repair bandwidth can be achieved by forcing a regular check node degree in the factor graph. Moreover, it is shown that given a fixed code rate, the variable node degree should also be regular to yield minimum repair bandwidth, under some reasonable minimum variable node degree constraint. It is also shown that for a given repair-bandwidth requirement, LDPC codes can yield substantially higher reliability than the currently utilized Reed-Solomon codes. Our reliability analysis is based on a formulation of the general equation for the mean-time-to-data-loss (MTTDL) associated with LDPC codes. The formulation reveals that the stopping number is closely related to the MTTDL. It is further shown that LDPC codes can be designed such that a small loss of repair-bandwidth optimality may be traded for a large improvement in erasure-correction capability and thus the MTTDL. |
doi_str_mv | 10.1109/TCOMM.2017.2769116 |
format | Article |
fullrecord | <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_ieee_primary_8094003</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>8094003</ieee_id><sourcerecordid>2174490403</sourcerecordid><originalsourceid>FETCH-LOGICAL-c295t-fd3a5dd31851d1e2b30125a8ae6e5f80b4115ab222dc4a72fb7c6ea57fb62da93</originalsourceid><addsrcrecordid>eNo9kMtOwzAQRS0EEqXwA7CxxLYpYyeOE3aQ8pJaFUFZW048SV2VpNgpqH9P-oDVSFdz7mgOIZcMhoxBejPLppPJkAOTQy7jlLH4iPSYEEkAiZDHpAeQQhBLmZySM-8XABBBGPZINR69ZjRrDNIRelvVtGwcHVnfOpuvWzT0vW2crvCW3uulrgtbV_QNV9q6LqjNjzXtfNAlS6tzu7TtZkC7-I-i0290c9TmnJyUeunx4jD75OPxYZY9B-Pp00t2Nw4Knoo2KE2ohTEhSwQzDHkeAuNCJxpjFGUCecSY0Dnn3BSRlrzMZRGjFrLMY250GvbJ9b535ZqvNfpWLZq1q7uTijMZRenu7z7h-63CNd47LNXK2U_tNoqB2gpVO6FqK1QdhHbQ1R6yiPgPJJBG0FX-Al_LcbY</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2174490403</pqid></control><display><type>article</type><title>LDPC Code Design for Distributed Storage: Balancing Repair Bandwidth, Reliability, and Storage Overhead</title><source>IEEE Electronic Library (IEL)</source><creator>Park, Hyegyeong ; Lee, Dongwon ; Moon, Jaekyun</creator><creatorcontrib>Park, Hyegyeong ; Lee, Dongwon ; Moon, Jaekyun</creatorcontrib><description>Distributed storage systems suffer from significant repair traffic generated due to the frequent storage node failures. This paper shows that properly designed low-density parity-check (LDPC) codes can substantially reduce the amount of required block downloads for repair thanks to the sparse nature of their factor graph representation. In particular, with a careful construction of the factor graph, both low repair-bandwidth and high reliability can be achieved for a given code rate. First, a formula for the average repair bandwidth of LDPC codes is developed. This formula is then used to establish that the minimum repair bandwidth can be achieved by forcing a regular check node degree in the factor graph. Moreover, it is shown that given a fixed code rate, the variable node degree should also be regular to yield minimum repair bandwidth, under some reasonable minimum variable node degree constraint. It is also shown that for a given repair-bandwidth requirement, LDPC codes can yield substantially higher reliability than the currently utilized Reed-Solomon codes. Our reliability analysis is based on a formulation of the general equation for the mean-time-to-data-loss (MTTDL) associated with LDPC codes. The formulation reveals that the stopping number is closely related to the MTTDL. It is further shown that LDPC codes can be designed such that a small loss of repair-bandwidth optimality may be traded for a large improvement in erasure-correction capability and thus the MTTDL.</description><identifier>ISSN: 0090-6778</identifier><identifier>EISSN: 1558-0857</identifier><identifier>DOI: 10.1109/TCOMM.2017.2769116</identifier><identifier>CODEN: IECMBT</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Bandwidth ; Binary system ; Codes ; Distributed storage ; Encoding ; Error correcting codes ; Facebook ; factor graph ; Graph representations ; Graphical representations ; Low density parity check codes ; low-density parity-check (LDPC) codes ; Maintenance engineering ; mean-time-to-data-loss (MTTDL) ; Nodes ; Parity check codes ; Reed-Solomon codes ; Reliability analysis ; Reliability aspects ; Reliability engineering ; Repair ; repair bandwidth ; Storage systems</subject><ispartof>IEEE transactions on communications, 2018-02, Vol.66 (2), p.507-520</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2018</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c295t-fd3a5dd31851d1e2b30125a8ae6e5f80b4115ab222dc4a72fb7c6ea57fb62da93</citedby><cites>FETCH-LOGICAL-c295t-fd3a5dd31851d1e2b30125a8ae6e5f80b4115ab222dc4a72fb7c6ea57fb62da93</cites><orcidid>0000-0003-3686-8891 ; 0000-0003-0993-5788</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/8094003$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/8094003$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Park, Hyegyeong</creatorcontrib><creatorcontrib>Lee, Dongwon</creatorcontrib><creatorcontrib>Moon, Jaekyun</creatorcontrib><title>LDPC Code Design for Distributed Storage: Balancing Repair Bandwidth, Reliability, and Storage Overhead</title><title>IEEE transactions on communications</title><addtitle>TCOMM</addtitle><description>Distributed storage systems suffer from significant repair traffic generated due to the frequent storage node failures. This paper shows that properly designed low-density parity-check (LDPC) codes can substantially reduce the amount of required block downloads for repair thanks to the sparse nature of their factor graph representation. In particular, with a careful construction of the factor graph, both low repair-bandwidth and high reliability can be achieved for a given code rate. First, a formula for the average repair bandwidth of LDPC codes is developed. This formula is then used to establish that the minimum repair bandwidth can be achieved by forcing a regular check node degree in the factor graph. Moreover, it is shown that given a fixed code rate, the variable node degree should also be regular to yield minimum repair bandwidth, under some reasonable minimum variable node degree constraint. It is also shown that for a given repair-bandwidth requirement, LDPC codes can yield substantially higher reliability than the currently utilized Reed-Solomon codes. Our reliability analysis is based on a formulation of the general equation for the mean-time-to-data-loss (MTTDL) associated with LDPC codes. The formulation reveals that the stopping number is closely related to the MTTDL. It is further shown that LDPC codes can be designed such that a small loss of repair-bandwidth optimality may be traded for a large improvement in erasure-correction capability and thus the MTTDL.</description><subject>Bandwidth</subject><subject>Binary system</subject><subject>Codes</subject><subject>Distributed storage</subject><subject>Encoding</subject><subject>Error correcting codes</subject><subject>Facebook</subject><subject>factor graph</subject><subject>Graph representations</subject><subject>Graphical representations</subject><subject>Low density parity check codes</subject><subject>low-density parity-check (LDPC) codes</subject><subject>Maintenance engineering</subject><subject>mean-time-to-data-loss (MTTDL)</subject><subject>Nodes</subject><subject>Parity check codes</subject><subject>Reed-Solomon codes</subject><subject>Reliability analysis</subject><subject>Reliability aspects</subject><subject>Reliability engineering</subject><subject>Repair</subject><subject>repair bandwidth</subject><subject>Storage systems</subject><issn>0090-6778</issn><issn>1558-0857</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kMtOwzAQRS0EEqXwA7CxxLYpYyeOE3aQ8pJaFUFZW048SV2VpNgpqH9P-oDVSFdz7mgOIZcMhoxBejPLppPJkAOTQy7jlLH4iPSYEEkAiZDHpAeQQhBLmZySM-8XABBBGPZINR69ZjRrDNIRelvVtGwcHVnfOpuvWzT0vW2crvCW3uulrgtbV_QNV9q6LqjNjzXtfNAlS6tzu7TtZkC7-I-i0290c9TmnJyUeunx4jD75OPxYZY9B-Pp00t2Nw4Knoo2KE2ohTEhSwQzDHkeAuNCJxpjFGUCecSY0Dnn3BSRlrzMZRGjFrLMY250GvbJ9b535ZqvNfpWLZq1q7uTijMZRenu7z7h-63CNd47LNXK2U_tNoqB2gpVO6FqK1QdhHbQ1R6yiPgPJJBG0FX-Al_LcbY</recordid><startdate>20180201</startdate><enddate>20180201</enddate><creator>Park, Hyegyeong</creator><creator>Lee, Dongwon</creator><creator>Moon, Jaekyun</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SP</scope><scope>8FD</scope><scope>L7M</scope><orcidid>https://orcid.org/0000-0003-3686-8891</orcidid><orcidid>https://orcid.org/0000-0003-0993-5788</orcidid></search><sort><creationdate>20180201</creationdate><title>LDPC Code Design for Distributed Storage: Balancing Repair Bandwidth, Reliability, and Storage Overhead</title><author>Park, Hyegyeong ; Lee, Dongwon ; Moon, Jaekyun</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c295t-fd3a5dd31851d1e2b30125a8ae6e5f80b4115ab222dc4a72fb7c6ea57fb62da93</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Bandwidth</topic><topic>Binary system</topic><topic>Codes</topic><topic>Distributed storage</topic><topic>Encoding</topic><topic>Error correcting codes</topic><topic>Facebook</topic><topic>factor graph</topic><topic>Graph representations</topic><topic>Graphical representations</topic><topic>Low density parity check codes</topic><topic>low-density parity-check (LDPC) codes</topic><topic>Maintenance engineering</topic><topic>mean-time-to-data-loss (MTTDL)</topic><topic>Nodes</topic><topic>Parity check codes</topic><topic>Reed-Solomon codes</topic><topic>Reliability analysis</topic><topic>Reliability aspects</topic><topic>Reliability engineering</topic><topic>Repair</topic><topic>repair bandwidth</topic><topic>Storage systems</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Park, Hyegyeong</creatorcontrib><creatorcontrib>Lee, Dongwon</creatorcontrib><creatorcontrib>Moon, Jaekyun</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>Advanced Technologies Database with Aerospace</collection><jtitle>IEEE transactions on communications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Park, Hyegyeong</au><au>Lee, Dongwon</au><au>Moon, Jaekyun</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>LDPC Code Design for Distributed Storage: Balancing Repair Bandwidth, Reliability, and Storage Overhead</atitle><jtitle>IEEE transactions on communications</jtitle><stitle>TCOMM</stitle><date>2018-02-01</date><risdate>2018</risdate><volume>66</volume><issue>2</issue><spage>507</spage><epage>520</epage><pages>507-520</pages><issn>0090-6778</issn><eissn>1558-0857</eissn><coden>IECMBT</coden><abstract>Distributed storage systems suffer from significant repair traffic generated due to the frequent storage node failures. This paper shows that properly designed low-density parity-check (LDPC) codes can substantially reduce the amount of required block downloads for repair thanks to the sparse nature of their factor graph representation. In particular, with a careful construction of the factor graph, both low repair-bandwidth and high reliability can be achieved for a given code rate. First, a formula for the average repair bandwidth of LDPC codes is developed. This formula is then used to establish that the minimum repair bandwidth can be achieved by forcing a regular check node degree in the factor graph. Moreover, it is shown that given a fixed code rate, the variable node degree should also be regular to yield minimum repair bandwidth, under some reasonable minimum variable node degree constraint. It is also shown that for a given repair-bandwidth requirement, LDPC codes can yield substantially higher reliability than the currently utilized Reed-Solomon codes. Our reliability analysis is based on a formulation of the general equation for the mean-time-to-data-loss (MTTDL) associated with LDPC codes. The formulation reveals that the stopping number is closely related to the MTTDL. It is further shown that LDPC codes can be designed such that a small loss of repair-bandwidth optimality may be traded for a large improvement in erasure-correction capability and thus the MTTDL.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TCOMM.2017.2769116</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0003-3686-8891</orcidid><orcidid>https://orcid.org/0000-0003-0993-5788</orcidid></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 0090-6778 |
ispartof | IEEE transactions on communications, 2018-02, Vol.66 (2), p.507-520 |
issn | 0090-6778 1558-0857 |
language | eng |
recordid | cdi_ieee_primary_8094003 |
source | IEEE Electronic Library (IEL) |
subjects | Bandwidth Binary system Codes Distributed storage Encoding Error correcting codes factor graph Graph representations Graphical representations Low density parity check codes low-density parity-check (LDPC) codes Maintenance engineering mean-time-to-data-loss (MTTDL) Nodes Parity check codes Reed-Solomon codes Reliability analysis Reliability aspects Reliability engineering Repair repair bandwidth Storage systems |
title | LDPC Code Design for Distributed Storage: Balancing Repair Bandwidth, Reliability, and Storage Overhead |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-13T07%3A45%3A03IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=LDPC%20Code%20Design%20for%20Distributed%20Storage:%20Balancing%20Repair%20Bandwidth,%20Reliability,%20and%20Storage%20Overhead&rft.jtitle=IEEE%20transactions%20on%20communications&rft.au=Park,%20Hyegyeong&rft.date=2018-02-01&rft.volume=66&rft.issue=2&rft.spage=507&rft.epage=520&rft.pages=507-520&rft.issn=0090-6778&rft.eissn=1558-0857&rft.coden=IECMBT&rft_id=info:doi/10.1109/TCOMM.2017.2769116&rft_dat=%3Cproquest_RIE%3E2174490403%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2174490403&rft_id=info:pmid/&rft_ieee_id=8094003&rfr_iscdi=true |