LDPC Code Design for Distributed Storage: Balancing Repair Bandwidth, Reliability, and Storage Overhead

Distributed storage systems suffer from significant repair traffic generated due to the frequent storage node failures. This paper shows that properly designed low-density parity-check (LDPC) codes can substantially reduce the amount of required block downloads for repair thanks to the sparse nature...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on communications 2018-02, Vol.66 (2), p.507-520
Hauptverfasser:	Park, Hyegyeong, Lee, Dongwon, Moon, Jaekyun
Format:	Artikel
Sprache:	eng
Schlagworte:	Bandwidth Binary system Codes Distributed storage Encoding Error correcting codes Facebook factor graph Graph representations Graphical representations Low density parity check codes low-density parity-check (LDPC) codes Maintenance engineering mean-time-to-data-loss (MTTDL) Nodes Parity check codes Reed-Solomon codes Reliability analysis Reliability aspects Reliability engineering Repair repair bandwidth Storage systems
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	520
container_issue	2
container_start_page	507
container_title	IEEE transactions on communications
container_volume	66
creator	Park, Hyegyeong Lee, Dongwon Moon, Jaekyun
description	Distributed storage systems suffer from significant repair traffic generated due to the frequent storage node failures. This paper shows that properly designed low-density parity-check (LDPC) codes can substantially reduce the amount of required block downloads for repair thanks to the sparse nature of their factor graph representation. In particular, with a careful construction of the factor graph, both low repair-bandwidth and high reliability can be achieved for a given code rate. First, a formula for the average repair bandwidth of LDPC codes is developed. This formula is then used to establish that the minimum repair bandwidth can be achieved by forcing a regular check node degree in the factor graph. Moreover, it is shown that given a fixed code rate, the variable node degree should also be regular to yield minimum repair bandwidth, under some reasonable minimum variable node degree constraint. It is also shown that for a given repair-bandwidth requirement, LDPC codes can yield substantially higher reliability than the currently utilized Reed-Solomon codes. Our reliability analysis is based on a formulation of the general equation for the mean-time-to-data-loss (MTTDL) associated with LDPC codes. The formulation reveals that the stopping number is closely related to the MTTDL. It is further shown that LDPC codes can be designed such that a small loss of repair-bandwidth optimality may be traded for a large improvement in erasure-correction capability and thus the MTTDL.
doi_str_mv	10.1109/TCOMM.2017.2769116
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_ieee_primary_8094003</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>8094003</ieee_id><sourcerecordid>2174490403</sourcerecordid><originalsourceid>FETCH-LOGICAL-c295t-fd3a5dd31851d1e2b30125a8ae6e5f80b4115ab222dc4a72fb7c6ea57fb62da93</originalsourceid><addsrcrecordid>eNo9kMtOwzAQRS0EEqXwA7CxxLYpYyeOE3aQ8pJaFUFZW048SV2VpNgpqH9P-oDVSFdz7mgOIZcMhoxBejPLppPJkAOTQy7jlLH4iPSYEEkAiZDHpAeQQhBLmZySM-8XABBBGPZINR69ZjRrDNIRelvVtGwcHVnfOpuvWzT0vW2crvCW3uulrgtbV_QNV9q6LqjNjzXtfNAlS6tzu7TtZkC7-I-i0290c9TmnJyUeunx4jD75OPxYZY9B-Pp00t2Nw4Knoo2KE2ohTEhSwQzDHkeAuNCJxpjFGUCecSY0Dnn3BSRlrzMZRGjFrLMY250GvbJ9b535ZqvNfpWLZq1q7uTijMZRenu7z7h-63CNd47LNXK2U_tNoqB2gpVO6FqK1QdhHbQ1R6yiPgPJJBG0FX-Al_LcbY</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2174490403</pqid></control><display><type>article</type><title>LDPC Code Design for Distributed Storage: Balancing Repair Bandwidth, Reliability, and Storage Overhead</title><source>IEEE Electronic Library (IEL)</source><creator>Park, Hyegyeong ; Lee, Dongwon ; Moon, Jaekyun</creator><creatorcontrib>Park, Hyegyeong ; Lee, Dongwon ; Moon, Jaekyun</creatorcontrib><description>Distributed storage systems suffer from significant repair traffic generated due to the frequent storage node failures. This paper shows that properly designed low-density parity-check (LDPC) codes can substantially reduce the amount of required block downloads for repair thanks to the sparse nature of their factor graph representation. In particular, with a careful construction of the factor graph, both low repair-bandwidth and high reliability can be achieved for a given code rate. First, a formula for the average repair bandwidth of LDPC codes is developed. This formula is then used to establish that the minimum repair bandwidth can be achieved by forcing a regular check node degree in the factor graph. Moreover, it is shown that given a fixed code rate, the variable node degree should also be regular to yield minimum repair bandwidth, under some reasonable minimum variable node degree constraint. It is also shown that for a given repair-bandwidth requirement, LDPC codes can yield substantially higher reliability than the currently utilized Reed-Solomon codes. Our reliability analysis is based on a formulation of the general equation for the mean-time-to-data-loss (MTTDL) associated with LDPC codes. The formulation reveals that the stopping number is closely related to the MTTDL. It is further shown that LDPC codes can be designed such that a small loss of repair-bandwidth optimality may be traded for a large improvement in erasure-correction capability and thus the MTTDL.</description><identifier>ISSN: 0090-6778</identifier><identifier>EISSN: 1558-0857</identifier><identifier>DOI: 10.1109/TCOMM.2017.2769116</identifier><identifier>CODEN: IECMBT</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Bandwidth ; Binary system ; Codes ; Distributed storage ; Encoding ; Error correcting codes ; Facebook ; factor graph ; Graph representations ; Graphical representations ; Low density parity check codes ; low-density parity-check (LDPC) codes ; Maintenance engineering ; mean-time-to-data-loss (MTTDL) ; Nodes ; Parity check codes ; Reed-Solomon codes ; Reliability analysis ; Reliability aspects ; Reliability engineering ; Repair ; repair bandwidth ; Storage systems</subject><ispartof>IEEE transactions on communications, 2018-02, Vol.66 (2), p.507-520</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2018</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c295t-fd3a5dd31851d1e2b30125a8ae6e5f80b4115ab222dc4a72fb7c6ea57fb62da93</citedby><cites>FETCH-LOGICAL-c295t-fd3a5dd31851d1e2b30125a8ae6e5f80b4115ab222dc4a72fb7c6ea57fb62da93</cites><orcidid>0000-0003-3686-8891 ; 0000-0003-0993-5788</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/8094003$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/8094003$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Park, Hyegyeong</creatorcontrib><creatorcontrib>Lee, Dongwon</creatorcontrib><creatorcontrib>Moon, Jaekyun</creatorcontrib><title>LDPC Code Design for Distributed Storage: Balancing Repair Bandwidth, Reliability, and Storage Overhead</title><title>IEEE transactions on communications</title><addtitle>TCOMM</addtitle><description>Distributed storage systems suffer from significant repair traffic generated due to the frequent storage node failures. This paper shows that properly designed low-density parity-check (LDPC) codes can substantially reduce the amount of required block downloads for repair thanks to the sparse nature of their factor graph representation. In particular, with a careful construction of the factor graph, both low repair-bandwidth and high reliability can be achieved for a given code rate. First, a formula for the average repair bandwidth of LDPC codes is developed. This formula is then used to establish that the minimum repair bandwidth can be achieved by forcing a regular check node degree in the factor graph. Moreover, it is shown that given a fixed code rate, the variable node degree should also be regular to yield minimum repair bandwidth, under some reasonable minimum variable node degree constraint. It is also shown that for a given repair-bandwidth requirement, LDPC codes can yield substantially higher reliability than the currently utilized Reed-Solomon codes. Our reliability analysis is based on a formulation of the general equation for the mean-time-to-data-loss (MTTDL) associated with LDPC codes. The formulation reveals that the stopping number is closely related to the MTTDL. It is further shown that LDPC codes can be designed such that a small loss of repair-bandwidth optimality may be traded for a large improvement in erasure-correction capability and thus the MTTDL.</description><subject>Bandwidth</subject><subject>Binary system</subject><subject>Codes</subject><subject>Distributed storage</subject><subject>Encoding</subject><subject>Error correcting codes</subject><subject>Facebook</subject><subject>factor graph</subject><subject>Graph representations</subject><subject>Graphical representations</subject><subject>Low density parity check codes</subject><subject>low-density parity-check (LDPC) codes</subject><subject>Maintenance engineering</subject><subject>mean-time-to-data-loss (MTTDL)</subject><subject>Nodes</subject><subject>Parity check codes</subject><subject>Reed-Solomon codes</subject><subject>Reliability analysis</subject><subject>Reliability aspects</subject><subject>Reliability engineering</subject><subject>Repair</subject><subject>repair bandwidth</subject><subject>Storage systems</subject><issn>0090-6778</issn><issn>1558-0857</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kMtOwzAQRS0EEqXwA7CxxLYpYyeOE3aQ8pJaFUFZW048SV2VpNgpqH9P-oDVSFdz7mgOIZcMhoxBejPLppPJkAOTQy7jlLH4iPSYEEkAiZDHpAeQQhBLmZySM-8XABBBGPZINR69ZjRrDNIRelvVtGwcHVnfOpuvWzT0vW2crvCW3uulrgtbV_QNV9q6LqjNjzXtfNAlS6tzu7TtZkC7-I-i0290c9TmnJyUeunx4jD75OPxYZY9B-Pp00t2Nw4Knoo2KE2ohTEhSwQzDHkeAuNCJxpjFGUCecSY0Dnn3BSRlrzMZRGjFrLMY250GvbJ9b535ZqvNfpWLZq1q7uTijMZRenu7z7h-63CNd47LNXK2U_tNoqB2gpVO6FqK1QdhHbQ1R6yiPgPJJBG0FX-Al_LcbY</recordid><startdate>20180201</startdate><enddate>20180201</enddate><creator>Park, Hyegyeong</creator><creator>Lee, Dongwon</creator><creator>Moon, Jaekyun</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SP</scope><scope>8FD</scope><scope>L7M</scope><orcidid>https://orcid.org/0000-0003-3686-8891</orcidid><orcidid>https://orcid.org/0000-0003-0993-5788</orcidid></search><sort><creationdate>20180201</creationdate><title>LDPC Code Design for Distributed Storage: Balancing Repair Bandwidth, Reliability, and Storage Overhead</title><author>Park, Hyegyeong ; Lee, Dongwon ; Moon, Jaekyun</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c295t-fd3a5dd31851d1e2b30125a8ae6e5f80b4115ab222dc4a72fb7c6ea57fb62da93</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Bandwidth</topic><topic>Binary system</topic><topic>Codes</topic><topic>Distributed storage</topic><topic>Encoding</topic><topic>Error correcting codes</topic><topic>Facebook</topic><topic>factor graph</topic><topic>Graph representations</topic><topic>Graphical representations</topic><topic>Low density parity check codes</topic><topic>low-density parity-check (LDPC) codes</topic><topic>Maintenance engineering</topic><topic>mean-time-to-data-loss (MTTDL)</topic><topic>Nodes</topic><topic>Parity check codes</topic><topic>Reed-Solomon codes</topic><topic>Reliability analysis</topic><topic>Reliability aspects</topic><topic>Reliability engineering</topic><topic>Repair</topic><topic>repair bandwidth</topic><topic>Storage systems</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Park, Hyegyeong</creatorcontrib><creatorcontrib>Lee, Dongwon</creatorcontrib><creatorcontrib>Moon, Jaekyun</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>Advanced Technologies Database with Aerospace</collection><jtitle>IEEE transactions on communications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Park, Hyegyeong</au><au>Lee, Dongwon</au><au>Moon, Jaekyun</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>LDPC Code Design for Distributed Storage: Balancing Repair Bandwidth, Reliability, and Storage Overhead</atitle><jtitle>IEEE transactions on communications</jtitle><stitle>TCOMM</stitle><date>2018-02-01</date><risdate>2018</risdate><volume>66</volume><issue>2</issue><spage>507</spage><epage>520</epage><pages>507-520</pages><issn>0090-6778</issn><eissn>1558-0857</eissn><coden>IECMBT</coden><abstract>Distributed storage systems suffer from significant repair traffic generated due to the frequent storage node failures. This paper shows that properly designed low-density parity-check (LDPC) codes can substantially reduce the amount of required block downloads for repair thanks to the sparse nature of their factor graph representation. In particular, with a careful construction of the factor graph, both low repair-bandwidth and high reliability can be achieved for a given code rate. First, a formula for the average repair bandwidth of LDPC codes is developed. This formula is then used to establish that the minimum repair bandwidth can be achieved by forcing a regular check node degree in the factor graph. Moreover, it is shown that given a fixed code rate, the variable node degree should also be regular to yield minimum repair bandwidth, under some reasonable minimum variable node degree constraint. It is also shown that for a given repair-bandwidth requirement, LDPC codes can yield substantially higher reliability than the currently utilized Reed-Solomon codes. Our reliability analysis is based on a formulation of the general equation for the mean-time-to-data-loss (MTTDL) associated with LDPC codes. The formulation reveals that the stopping number is closely related to the MTTDL. It is further shown that LDPC codes can be designed such that a small loss of repair-bandwidth optimality may be traded for a large improvement in erasure-correction capability and thus the MTTDL.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TCOMM.2017.2769116</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0003-3686-8891</orcidid><orcidid>https://orcid.org/0000-0003-0993-5788</orcidid></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 0090-6778
ispartof	IEEE transactions on communications, 2018-02, Vol.66 (2), p.507-520
issn	0090-6778 1558-0857
language	eng
recordid	cdi_ieee_primary_8094003
source	IEEE Electronic Library (IEL)
subjects	Bandwidth Binary system Codes Distributed storage Encoding Error correcting codes Facebook factor graph Graph representations Graphical representations Low density parity check codes low-density parity-check (LDPC) codes Maintenance engineering mean-time-to-data-loss (MTTDL) Nodes Parity check codes Reed-Solomon codes Reliability analysis Reliability aspects Reliability engineering Repair repair bandwidth Storage systems
title	LDPC Code Design for Distributed Storage: Balancing Repair Bandwidth, Reliability, and Storage Overhead
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-13T07%3A45%3A03IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=LDPC%20Code%20Design%20for%20Distributed%20Storage:%20Balancing%20Repair%20Bandwidth,%20Reliability,%20and%20Storage%20Overhead&rft.jtitle=IEEE%20transactions%20on%20communications&rft.au=Park,%20Hyegyeong&rft.date=2018-02-01&rft.volume=66&rft.issue=2&rft.spage=507&rft.epage=520&rft.pages=507-520&rft.issn=0090-6778&rft.eissn=1558-0857&rft.coden=IECMBT&rft_id=info:doi/10.1109/TCOMM.2017.2769116&rft_dat=%3Cproquest_RIE%3E2174490403%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2174490403&rft_id=info:pmid/&rft_ieee_id=8094003&rfr_iscdi=true