Error-Correcting Codes for Noisy Duplication Channels
Because of its high data density and longevity, DNA is emerging as a promising candidate for satisfying increasing data storage needs. Compared to conventional storage media, however, data stored in DNA is subject to a wider range of errors resulting from various processes involved in the data stora...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on information theory 2021-06, Vol.67 (6), p.3452-3463 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 3463 |
---|---|
container_issue | 6 |
container_start_page | 3452 |
container_title | IEEE transactions on information theory |
container_volume | 67 |
creator | Tang, Yuanyuan Farnoud, Farzad |
description | Because of its high data density and longevity, DNA is emerging as a promising candidate for satisfying increasing data storage needs. Compared to conventional storage media, however, data stored in DNA is subject to a wider range of errors resulting from various processes involved in the data storage pipeline. In this article, we consider correcting duplication errors for both exact and noisy tandem duplications of a given length k . An exact duplication inserts a copy of a substring of length k of the sequence immediately after that substring, e.g., \mathsf {ACGT} \to \mathsf {ACG\underline {ACG}T} , where k=3 , while a noisy duplication inserts a copy suffering from substitution noise, e.g., \mathsf {ACGT} \to \mathsf {ACG\underline {A \color {Red}{T}}GT} . Specifically, we design codes that can correct any number of exact duplication and one noisy duplication errors, where in the noisy duplication case the copy is at Hamming distance 1 from the original. Our constructions rely upon recovering the duplication root of the stored codeword. We characterize the ways in which duplication errors manifest in the root of affected sequences and design efficient codes for correcting these error patterns. We show that the proposed construction is asymptotically optimal, in the sense that it has the same asymptotic rate as optimal codes correcting exact duplications only. |
doi_str_mv | 10.1109/TIT.2021.3059095 |
format | Article |
fullrecord | <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_2530111105</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9353552</ieee_id><sourcerecordid>2530111105</sourcerecordid><originalsourceid>FETCH-LOGICAL-c333t-fc1a6e810dd16ed9c1e84ca7de6828f4985c40fefec9a89e3fa4feccfac8cf0f3</originalsourceid><addsrcrecordid>eNo9kEFLAzEQRoMoWKt3wcuC562TTWabHGVttVD0sp5DyE50S93UZHvovzelxbnMDLxvBh5j9xxmnIN-alftrIKKzwSgBo0XbMIR56WuUV6yCQBXpZZSXbOblDZ5lcirCcNFjCGWTYiR3NgPX0UTOkqFD7F4D306FC_73bZ3duzDUDTfdhhom27ZlbfbRHfnPmWfy0XbvJXrj9dV87wunRBiLL3jtibFoet4TZ12nJR0dt5RrSrlpVboJHjy5LRVmoS3Ms_OW6ecBy-m7PF0dxfD757SaDZhH4f80lQogOcCzBScKBdDSpG82cX-x8aD4WCOckyWY45yzFlOjjycIj0R_eNaoECsxB_N12Dc</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2530111105</pqid></control><display><type>article</type><title>Error-Correcting Codes for Noisy Duplication Channels</title><source>IEEE Electronic Library (IEL)</source><creator>Tang, Yuanyuan ; Farnoud, Farzad</creator><creatorcontrib>Tang, Yuanyuan ; Farnoud, Farzad</creatorcontrib><description><![CDATA[Because of its high data density and longevity, DNA is emerging as a promising candidate for satisfying increasing data storage needs. Compared to conventional storage media, however, data stored in DNA is subject to a wider range of errors resulting from various processes involved in the data storage pipeline. In this article, we consider correcting duplication errors for both exact and noisy tandem duplications of a given length <inline-formula> <tex-math notation="LaTeX">k </tex-math></inline-formula>. An exact duplication inserts a copy of a substring of length <inline-formula> <tex-math notation="LaTeX">k </tex-math></inline-formula> of the sequence immediately after that substring, e.g., <inline-formula> <tex-math notation="LaTeX">\mathsf {ACGT} \to \mathsf {ACG\underline {ACG}T} </tex-math></inline-formula>, where <inline-formula> <tex-math notation="LaTeX">k=3 </tex-math></inline-formula>, while a noisy duplication inserts a copy suffering from substitution noise, e.g., <inline-formula> <tex-math notation="LaTeX">\mathsf {ACGT} \to \mathsf {ACG\underline {A \color {Red}{T}}GT} </tex-math></inline-formula>. Specifically, we design codes that can correct any number of exact duplication and one noisy duplication errors, where in the noisy duplication case the copy is at Hamming distance 1 from the original. Our constructions rely upon recovering the duplication root of the stored codeword. We characterize the ways in which duplication errors manifest in the root of affected sequences and design efficient codes for correcting these error patterns. We show that the proposed construction is asymptotically optimal, in the sense that it has the same asymptotic rate as optimal codes correcting exact duplications only.]]></description><identifier>ISSN: 0018-9448</identifier><identifier>EISSN: 1557-9654</identifier><identifier>DOI: 10.1109/TIT.2021.3059095</identifier><identifier>CODEN: IETTAW</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Asymptotic properties ; Codes ; Data storage ; DNA ; DNA storage ; Error correcting codes ; Error correction ; Error correction codes ; exact tandem duplication ; Hamming distance ; Inserts ; Media ; Memory ; Noise measurement ; noisy tandem duplication ; Reproduction (copying) ; Transforms</subject><ispartof>IEEE transactions on information theory, 2021-06, Vol.67 (6), p.3452-3463</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2021</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c333t-fc1a6e810dd16ed9c1e84ca7de6828f4985c40fefec9a89e3fa4feccfac8cf0f3</citedby><cites>FETCH-LOGICAL-c333t-fc1a6e810dd16ed9c1e84ca7de6828f4985c40fefec9a89e3fa4feccfac8cf0f3</cites><orcidid>0000-0003-2946-7782 ; 0000-0002-8684-4487</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9353552$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9353552$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Tang, Yuanyuan</creatorcontrib><creatorcontrib>Farnoud, Farzad</creatorcontrib><title>Error-Correcting Codes for Noisy Duplication Channels</title><title>IEEE transactions on information theory</title><addtitle>TIT</addtitle><description><![CDATA[Because of its high data density and longevity, DNA is emerging as a promising candidate for satisfying increasing data storage needs. Compared to conventional storage media, however, data stored in DNA is subject to a wider range of errors resulting from various processes involved in the data storage pipeline. In this article, we consider correcting duplication errors for both exact and noisy tandem duplications of a given length <inline-formula> <tex-math notation="LaTeX">k </tex-math></inline-formula>. An exact duplication inserts a copy of a substring of length <inline-formula> <tex-math notation="LaTeX">k </tex-math></inline-formula> of the sequence immediately after that substring, e.g., <inline-formula> <tex-math notation="LaTeX">\mathsf {ACGT} \to \mathsf {ACG\underline {ACG}T} </tex-math></inline-formula>, where <inline-formula> <tex-math notation="LaTeX">k=3 </tex-math></inline-formula>, while a noisy duplication inserts a copy suffering from substitution noise, e.g., <inline-formula> <tex-math notation="LaTeX">\mathsf {ACGT} \to \mathsf {ACG\underline {A \color {Red}{T}}GT} </tex-math></inline-formula>. Specifically, we design codes that can correct any number of exact duplication and one noisy duplication errors, where in the noisy duplication case the copy is at Hamming distance 1 from the original. Our constructions rely upon recovering the duplication root of the stored codeword. We characterize the ways in which duplication errors manifest in the root of affected sequences and design efficient codes for correcting these error patterns. We show that the proposed construction is asymptotically optimal, in the sense that it has the same asymptotic rate as optimal codes correcting exact duplications only.]]></description><subject>Asymptotic properties</subject><subject>Codes</subject><subject>Data storage</subject><subject>DNA</subject><subject>DNA storage</subject><subject>Error correcting codes</subject><subject>Error correction</subject><subject>Error correction codes</subject><subject>exact tandem duplication</subject><subject>Hamming distance</subject><subject>Inserts</subject><subject>Media</subject><subject>Memory</subject><subject>Noise measurement</subject><subject>noisy tandem duplication</subject><subject>Reproduction (copying)</subject><subject>Transforms</subject><issn>0018-9448</issn><issn>1557-9654</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kEFLAzEQRoMoWKt3wcuC562TTWabHGVttVD0sp5DyE50S93UZHvovzelxbnMDLxvBh5j9xxmnIN-alftrIKKzwSgBo0XbMIR56WuUV6yCQBXpZZSXbOblDZ5lcirCcNFjCGWTYiR3NgPX0UTOkqFD7F4D306FC_73bZ3duzDUDTfdhhom27ZlbfbRHfnPmWfy0XbvJXrj9dV87wunRBiLL3jtibFoet4TZ12nJR0dt5RrSrlpVboJHjy5LRVmoS3Ms_OW6ecBy-m7PF0dxfD757SaDZhH4f80lQogOcCzBScKBdDSpG82cX-x8aD4WCOckyWY45yzFlOjjycIj0R_eNaoECsxB_N12Dc</recordid><startdate>20210601</startdate><enddate>20210601</enddate><creator>Tang, Yuanyuan</creator><creator>Farnoud, Farzad</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0003-2946-7782</orcidid><orcidid>https://orcid.org/0000-0002-8684-4487</orcidid></search><sort><creationdate>20210601</creationdate><title>Error-Correcting Codes for Noisy Duplication Channels</title><author>Tang, Yuanyuan ; Farnoud, Farzad</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c333t-fc1a6e810dd16ed9c1e84ca7de6828f4985c40fefec9a89e3fa4feccfac8cf0f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Asymptotic properties</topic><topic>Codes</topic><topic>Data storage</topic><topic>DNA</topic><topic>DNA storage</topic><topic>Error correcting codes</topic><topic>Error correction</topic><topic>Error correction codes</topic><topic>exact tandem duplication</topic><topic>Hamming distance</topic><topic>Inserts</topic><topic>Media</topic><topic>Memory</topic><topic>Noise measurement</topic><topic>noisy tandem duplication</topic><topic>Reproduction (copying)</topic><topic>Transforms</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Tang, Yuanyuan</creatorcontrib><creatorcontrib>Farnoud, Farzad</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on information theory</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Tang, Yuanyuan</au><au>Farnoud, Farzad</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Error-Correcting Codes for Noisy Duplication Channels</atitle><jtitle>IEEE transactions on information theory</jtitle><stitle>TIT</stitle><date>2021-06-01</date><risdate>2021</risdate><volume>67</volume><issue>6</issue><spage>3452</spage><epage>3463</epage><pages>3452-3463</pages><issn>0018-9448</issn><eissn>1557-9654</eissn><coden>IETTAW</coden><abstract><![CDATA[Because of its high data density and longevity, DNA is emerging as a promising candidate for satisfying increasing data storage needs. Compared to conventional storage media, however, data stored in DNA is subject to a wider range of errors resulting from various processes involved in the data storage pipeline. In this article, we consider correcting duplication errors for both exact and noisy tandem duplications of a given length <inline-formula> <tex-math notation="LaTeX">k </tex-math></inline-formula>. An exact duplication inserts a copy of a substring of length <inline-formula> <tex-math notation="LaTeX">k </tex-math></inline-formula> of the sequence immediately after that substring, e.g., <inline-formula> <tex-math notation="LaTeX">\mathsf {ACGT} \to \mathsf {ACG\underline {ACG}T} </tex-math></inline-formula>, where <inline-formula> <tex-math notation="LaTeX">k=3 </tex-math></inline-formula>, while a noisy duplication inserts a copy suffering from substitution noise, e.g., <inline-formula> <tex-math notation="LaTeX">\mathsf {ACGT} \to \mathsf {ACG\underline {A \color {Red}{T}}GT} </tex-math></inline-formula>. Specifically, we design codes that can correct any number of exact duplication and one noisy duplication errors, where in the noisy duplication case the copy is at Hamming distance 1 from the original. Our constructions rely upon recovering the duplication root of the stored codeword. We characterize the ways in which duplication errors manifest in the root of affected sequences and design efficient codes for correcting these error patterns. We show that the proposed construction is asymptotically optimal, in the sense that it has the same asymptotic rate as optimal codes correcting exact duplications only.]]></abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TIT.2021.3059095</doi><tpages>12</tpages><orcidid>https://orcid.org/0000-0003-2946-7782</orcidid><orcidid>https://orcid.org/0000-0002-8684-4487</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 0018-9448 |
ispartof | IEEE transactions on information theory, 2021-06, Vol.67 (6), p.3452-3463 |
issn | 0018-9448 1557-9654 |
language | eng |
recordid | cdi_proquest_journals_2530111105 |
source | IEEE Electronic Library (IEL) |
subjects | Asymptotic properties Codes Data storage DNA DNA storage Error correcting codes Error correction Error correction codes exact tandem duplication Hamming distance Inserts Media Memory Noise measurement noisy tandem duplication Reproduction (copying) Transforms |
title | Error-Correcting Codes for Noisy Duplication Channels |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-06T07%3A14%3A35IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Error-Correcting%20Codes%20for%20Noisy%20Duplication%20Channels&rft.jtitle=IEEE%20transactions%20on%20information%20theory&rft.au=Tang,%20Yuanyuan&rft.date=2021-06-01&rft.volume=67&rft.issue=6&rft.spage=3452&rft.epage=3463&rft.pages=3452-3463&rft.issn=0018-9448&rft.eissn=1557-9654&rft.coden=IETTAW&rft_id=info:doi/10.1109/TIT.2021.3059095&rft_dat=%3Cproquest_RIE%3E2530111105%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2530111105&rft_id=info:pmid/&rft_ieee_id=9353552&rfr_iscdi=true |