Error-Correcting Codes for Short Tandem Duplication and Edit Errors
Due to its high data density and longevity, DNA is considered a promising medium for satisfying ever-increasing data storage needs. However, the diversity of errors that occur in DNA sequences makes efficient error-correction a challenging task. This paper aims to address simultaneously correcting t...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on information theory 2022-02, Vol.68 (2), p.871-880 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 880 |
---|---|
container_issue | 2 |
container_start_page | 871 |
container_title | IEEE transactions on information theory |
container_volume | 68 |
creator | Tang, Yuanyuan Farnoud, Farzad |
description | Due to its high data density and longevity, DNA is considered a promising medium for satisfying ever-increasing data storage needs. However, the diversity of errors that occur in DNA sequences makes efficient error-correction a challenging task. This paper aims to address simultaneously correcting two types of errors, namely, short tandem duplication and edit errors, where an edit error may be a substitution, deletion, or insertion. We focus on tandem repeats of length at most 3 and design codes for correcting an arbitrary number of duplication errors and one edit error. Because an edited symbol can be duplicated many times (as part of substrings of various lengths), a single edit can affect an unbounded substring of the retrieved word. However, we show that with appropriate preprocessing, the effect may be limited to a substring of finite length, thus making efficient error-correction possible. We construct a code for correcting the aforementioned errors and provide lower bounds for its rate. Compared to optimal codes correcting only duplication errors, numerical results show that the asymptotic cost of protecting against an additional edit is only 0.003 bits/symbol when the alphabet has size 4, an important case corresponding to data storage in DNA. |
doi_str_mv | 10.1109/TIT.2021.3125724 |
format | Article |
fullrecord | <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_2621792950</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9605683</ieee_id><sourcerecordid>2621792950</sourcerecordid><originalsourceid>FETCH-LOGICAL-c333t-3d35d09c2317bdd460adbc827c1d827c4982857ddb0ce2ba690e2378d40d7b093</originalsourceid><addsrcrecordid>eNo9kE1LAzEQhoMoWKt3wUvA89bJ1yY5ylq1UPDgeg67mVS3tJuabA_-e7e2eJlhhuedgYeQWwYzxsA-1It6xoGzmWBcaS7PyIQppQtbKnlOJgDMFFZKc0mucl6Po1SMT0g1TymmooopBT90_SetIoZMVzHR96-YBlo3PYYtfdrvNp1vhi72dNzQOXYD_Qvna3KxajY53Jz6lHw8z-vqtVi-vSyqx2XhhRBDIVAoBOu5YLpFlCU02HrDtWd4qNIabpRGbMEH3jalhcCFNigBdQtWTMn98e4uxe99yINbx33qx5eOl5xpy62CkYIj5VPMOYWV26Vu26Qfx8AdVLlRlTuocidVY-TuGOlCCP-4LUGVRohfoBRjiA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2621792950</pqid></control><display><type>article</type><title>Error-Correcting Codes for Short Tandem Duplication and Edit Errors</title><source>IEEE Electronic Library (IEL)</source><creator>Tang, Yuanyuan ; Farnoud, Farzad</creator><creatorcontrib>Tang, Yuanyuan ; Farnoud, Farzad</creatorcontrib><description>Due to its high data density and longevity, DNA is considered a promising medium for satisfying ever-increasing data storage needs. However, the diversity of errors that occur in DNA sequences makes efficient error-correction a challenging task. This paper aims to address simultaneously correcting two types of errors, namely, short tandem duplication and edit errors, where an edit error may be a substitution, deletion, or insertion. We focus on tandem repeats of length at most 3 and design codes for correcting an arbitrary number of duplication errors and one edit error. Because an edited symbol can be duplicated many times (as part of substrings of various lengths), a single edit can affect an unbounded substring of the retrieved word. However, we show that with appropriate preprocessing, the effect may be limited to a substring of finite length, thus making efficient error-correction possible. We construct a code for correcting the aforementioned errors and provide lower bounds for its rate. Compared to optimal codes correcting only duplication errors, numerical results show that the asymptotic cost of protecting against an additional edit is only 0.003 bits/symbol when the alphabet has size 4, an important case corresponding to data storage in DNA.</description><identifier>ISSN: 0018-9448</identifier><identifier>EISSN: 1557-9654</identifier><identifier>DOI: 10.1109/TIT.2021.3125724</identifier><identifier>CODEN: IETTAW</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Codes ; Data storage ; DNA ; DNA data storage ; duplication errors ; edit errors ; Error correcting codes ; Error correction ; Error correction codes ; Gene sequencing ; Lower bounds ; Media ; Memory ; Noise measurement ; Reproduction (copying) ; Sequential analysis ; Task analysis</subject><ispartof>IEEE transactions on information theory, 2022-02, Vol.68 (2), p.871-880</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c333t-3d35d09c2317bdd460adbc827c1d827c4982857ddb0ce2ba690e2378d40d7b093</citedby><cites>FETCH-LOGICAL-c333t-3d35d09c2317bdd460adbc827c1d827c4982857ddb0ce2ba690e2378d40d7b093</cites><orcidid>0000-0003-2946-7782 ; 0000-0002-8684-4487</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9605683$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9605683$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Tang, Yuanyuan</creatorcontrib><creatorcontrib>Farnoud, Farzad</creatorcontrib><title>Error-Correcting Codes for Short Tandem Duplication and Edit Errors</title><title>IEEE transactions on information theory</title><addtitle>TIT</addtitle><description>Due to its high data density and longevity, DNA is considered a promising medium for satisfying ever-increasing data storage needs. However, the diversity of errors that occur in DNA sequences makes efficient error-correction a challenging task. This paper aims to address simultaneously correcting two types of errors, namely, short tandem duplication and edit errors, where an edit error may be a substitution, deletion, or insertion. We focus on tandem repeats of length at most 3 and design codes for correcting an arbitrary number of duplication errors and one edit error. Because an edited symbol can be duplicated many times (as part of substrings of various lengths), a single edit can affect an unbounded substring of the retrieved word. However, we show that with appropriate preprocessing, the effect may be limited to a substring of finite length, thus making efficient error-correction possible. We construct a code for correcting the aforementioned errors and provide lower bounds for its rate. Compared to optimal codes correcting only duplication errors, numerical results show that the asymptotic cost of protecting against an additional edit is only 0.003 bits/symbol when the alphabet has size 4, an important case corresponding to data storage in DNA.</description><subject>Codes</subject><subject>Data storage</subject><subject>DNA</subject><subject>DNA data storage</subject><subject>duplication errors</subject><subject>edit errors</subject><subject>Error correcting codes</subject><subject>Error correction</subject><subject>Error correction codes</subject><subject>Gene sequencing</subject><subject>Lower bounds</subject><subject>Media</subject><subject>Memory</subject><subject>Noise measurement</subject><subject>Reproduction (copying)</subject><subject>Sequential analysis</subject><subject>Task analysis</subject><issn>0018-9448</issn><issn>1557-9654</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kE1LAzEQhoMoWKt3wUvA89bJ1yY5ylq1UPDgeg67mVS3tJuabA_-e7e2eJlhhuedgYeQWwYzxsA-1It6xoGzmWBcaS7PyIQppQtbKnlOJgDMFFZKc0mucl6Po1SMT0g1TymmooopBT90_SetIoZMVzHR96-YBlo3PYYtfdrvNp1vhi72dNzQOXYD_Qvna3KxajY53Jz6lHw8z-vqtVi-vSyqx2XhhRBDIVAoBOu5YLpFlCU02HrDtWd4qNIabpRGbMEH3jalhcCFNigBdQtWTMn98e4uxe99yINbx33qx5eOl5xpy62CkYIj5VPMOYWV26Vu26Qfx8AdVLlRlTuocidVY-TuGOlCCP-4LUGVRohfoBRjiA</recordid><startdate>20220201</startdate><enddate>20220201</enddate><creator>Tang, Yuanyuan</creator><creator>Farnoud, Farzad</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0003-2946-7782</orcidid><orcidid>https://orcid.org/0000-0002-8684-4487</orcidid></search><sort><creationdate>20220201</creationdate><title>Error-Correcting Codes for Short Tandem Duplication and Edit Errors</title><author>Tang, Yuanyuan ; Farnoud, Farzad</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c333t-3d35d09c2317bdd460adbc827c1d827c4982857ddb0ce2ba690e2378d40d7b093</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Codes</topic><topic>Data storage</topic><topic>DNA</topic><topic>DNA data storage</topic><topic>duplication errors</topic><topic>edit errors</topic><topic>Error correcting codes</topic><topic>Error correction</topic><topic>Error correction codes</topic><topic>Gene sequencing</topic><topic>Lower bounds</topic><topic>Media</topic><topic>Memory</topic><topic>Noise measurement</topic><topic>Reproduction (copying)</topic><topic>Sequential analysis</topic><topic>Task analysis</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Tang, Yuanyuan</creatorcontrib><creatorcontrib>Farnoud, Farzad</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005–Present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on information theory</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Tang, Yuanyuan</au><au>Farnoud, Farzad</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Error-Correcting Codes for Short Tandem Duplication and Edit Errors</atitle><jtitle>IEEE transactions on information theory</jtitle><stitle>TIT</stitle><date>2022-02-01</date><risdate>2022</risdate><volume>68</volume><issue>2</issue><spage>871</spage><epage>880</epage><pages>871-880</pages><issn>0018-9448</issn><eissn>1557-9654</eissn><coden>IETTAW</coden><abstract>Due to its high data density and longevity, DNA is considered a promising medium for satisfying ever-increasing data storage needs. However, the diversity of errors that occur in DNA sequences makes efficient error-correction a challenging task. This paper aims to address simultaneously correcting two types of errors, namely, short tandem duplication and edit errors, where an edit error may be a substitution, deletion, or insertion. We focus on tandem repeats of length at most 3 and design codes for correcting an arbitrary number of duplication errors and one edit error. Because an edited symbol can be duplicated many times (as part of substrings of various lengths), a single edit can affect an unbounded substring of the retrieved word. However, we show that with appropriate preprocessing, the effect may be limited to a substring of finite length, thus making efficient error-correction possible. We construct a code for correcting the aforementioned errors and provide lower bounds for its rate. Compared to optimal codes correcting only duplication errors, numerical results show that the asymptotic cost of protecting against an additional edit is only 0.003 bits/symbol when the alphabet has size 4, an important case corresponding to data storage in DNA.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TIT.2021.3125724</doi><tpages>10</tpages><orcidid>https://orcid.org/0000-0003-2946-7782</orcidid><orcidid>https://orcid.org/0000-0002-8684-4487</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 0018-9448 |
ispartof | IEEE transactions on information theory, 2022-02, Vol.68 (2), p.871-880 |
issn | 0018-9448 1557-9654 |
language | eng |
recordid | cdi_proquest_journals_2621792950 |
source | IEEE Electronic Library (IEL) |
subjects | Codes Data storage DNA DNA data storage duplication errors edit errors Error correcting codes Error correction Error correction codes Gene sequencing Lower bounds Media Memory Noise measurement Reproduction (copying) Sequential analysis Task analysis |
title | Error-Correcting Codes for Short Tandem Duplication and Edit Errors |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-31T22%3A29%3A41IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Error-Correcting%20Codes%20for%20Short%20Tandem%20Duplication%20and%20Edit%20Errors&rft.jtitle=IEEE%20transactions%20on%20information%20theory&rft.au=Tang,%20Yuanyuan&rft.date=2022-02-01&rft.volume=68&rft.issue=2&rft.spage=871&rft.epage=880&rft.pages=871-880&rft.issn=0018-9448&rft.eissn=1557-9654&rft.coden=IETTAW&rft_id=info:doi/10.1109/TIT.2021.3125724&rft_dat=%3Cproquest_RIE%3E2621792950%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2621792950&rft_id=info:pmid/&rft_ieee_id=9605683&rfr_iscdi=true |