Generalized Unique Reconstruction from Substrings

This paper introduces a new family of reconstruction codes which is motivated by applications in DNA data storage and sequencing. In such applications, DNA strands are sequenced by reading some subset of their substrings. While previous works considered two extreme cases in which all substrings of p...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on information theory 2023-09, Vol.69 (9), p.1-1
Hauptverfasser: Yehezkeally, Yonatan, Bar-Lev, Daniella, Marcovich, Sagi, Yaakobi, Eitan
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1
container_issue 9
container_start_page 1
container_title IEEE transactions on information theory
container_volume 69
creator Yehezkeally, Yonatan
Bar-Lev, Daniella
Marcovich, Sagi
Yaakobi, Eitan
description This paper introduces a new family of reconstruction codes which is motivated by applications in DNA data storage and sequencing. In such applications, DNA strands are sequenced by reading some subset of their substrings. While previous works considered two extreme cases in which all substrings of predefined lengths are read or substrings are read with no overlap for the single string case, this work studies two extensions of this paradigm. The first extension considers the setup in which consecutive substrings are read with some given minimum overlap. First, an upper bound is provided on the attainable rates of codes that guarantee unique reconstruction. Then, efficient constructions of codes that asymptotically meet that upper bound are presented. In the second extension, we study the setup where multiple strings are reconstructed together. Given the number of strings and their length, we first derive a lower bound on the read substrings' length ℓ that is necessary for the existence of multi-strand reconstruction codes with non-vanishing rates. We then present two constructions of such codes and show that their rates approach 1 for values of ℓ that asymptotically behave like the lower bound.
doi_str_mv 10.1109/TIT.2023.3269124
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TIT_2023_3269124</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10106477</ieee_id><sourcerecordid>2855276781</sourcerecordid><originalsourceid>FETCH-LOGICAL-c207t-52634d082f626d75448cde94b60e26ece8f03d70006f667ce9408856e149d15f3</originalsourceid><addsrcrecordid>eNpNkDFPwzAQhS0EEqGwMzBEYk45O_bZGVEFpVIlJGhnK3XOKFWbFDsZ4Nfjqh2YTnf33r3Tx9g9hynnUD2tFqupAFFOS4EVF_KCZVwpXVSo5CXLALgpKinNNbuJcZtaqbjIGJ9TR6Hetb_U5Ouu_R4p_yDXd3EIoxvavst96Pf557hJk7b7irfsyte7SHfnOmHr15fV7K1Yvs8Xs-dl4QTooVACS9mAER4FNlqlaNdQJTcIJJAcGQ9lowEAPaJ2aQXGKCQuq4YrX07Y4-nuIfTpqzjYbT-GLkVaYZQSGrXhSQUnlQt9jIG8PYR2X4cfy8EewdgExh7B2DOYZHk4WVoi-ifngFLr8g903F1y</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2855276781</pqid></control><display><type>article</type><title>Generalized Unique Reconstruction from Substrings</title><source>IEEE Electronic Library (IEL)</source><creator>Yehezkeally, Yonatan ; Bar-Lev, Daniella ; Marcovich, Sagi ; Yaakobi, Eitan</creator><creatorcontrib>Yehezkeally, Yonatan ; Bar-Lev, Daniella ; Marcovich, Sagi ; Yaakobi, Eitan</creatorcontrib><description>This paper introduces a new family of reconstruction codes which is motivated by applications in DNA data storage and sequencing. In such applications, DNA strands are sequenced by reading some subset of their substrings. While previous works considered two extreme cases in which all substrings of predefined lengths are read or substrings are read with no overlap for the single string case, this work studies two extensions of this paradigm. The first extension considers the setup in which consecutive substrings are read with some given minimum overlap. First, an upper bound is provided on the attainable rates of codes that guarantee unique reconstruction. Then, efficient constructions of codes that asymptotically meet that upper bound are presented. In the second extension, we study the setup where multiple strings are reconstructed together. Given the number of strings and their length, we first derive a lower bound on the read substrings' length ℓ that is necessary for the existence of multi-strand reconstruction codes with non-vanishing rates. We then present two constructions of such codes and show that their rates approach 1 for values of ℓ that asymptotically behave like the lower bound.</description><identifier>ISSN: 0018-9448</identifier><identifier>EISSN: 1557-9654</identifier><identifier>DOI: 10.1109/TIT.2023.3269124</identifier><identifier>CODEN: IETTAW</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Asymptotic properties ; Codes ; Data storage ; Gene sequencing ; Lower bounds ; Polymers ; Radio frequency ; Reconstruction ; Redundancy ; Sequential analysis ; Strings ; Symbols ; Technological innovation ; Upper bounds</subject><ispartof>IEEE transactions on information theory, 2023-09, Vol.69 (9), p.1-1</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c207t-52634d082f626d75448cde94b60e26ece8f03d70006f667ce9408856e149d15f3</citedby><cites>FETCH-LOGICAL-c207t-52634d082f626d75448cde94b60e26ece8f03d70006f667ce9408856e149d15f3</cites><orcidid>0000-0003-4165-2024 ; 0000-0003-1652-9761 ; 0000-0001-6766-1450 ; 0000-0002-9851-5234</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10106477$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27903,27904,54736</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10106477$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Yehezkeally, Yonatan</creatorcontrib><creatorcontrib>Bar-Lev, Daniella</creatorcontrib><creatorcontrib>Marcovich, Sagi</creatorcontrib><creatorcontrib>Yaakobi, Eitan</creatorcontrib><title>Generalized Unique Reconstruction from Substrings</title><title>IEEE transactions on information theory</title><addtitle>TIT</addtitle><description>This paper introduces a new family of reconstruction codes which is motivated by applications in DNA data storage and sequencing. In such applications, DNA strands are sequenced by reading some subset of their substrings. While previous works considered two extreme cases in which all substrings of predefined lengths are read or substrings are read with no overlap for the single string case, this work studies two extensions of this paradigm. The first extension considers the setup in which consecutive substrings are read with some given minimum overlap. First, an upper bound is provided on the attainable rates of codes that guarantee unique reconstruction. Then, efficient constructions of codes that asymptotically meet that upper bound are presented. In the second extension, we study the setup where multiple strings are reconstructed together. Given the number of strings and their length, we first derive a lower bound on the read substrings' length ℓ that is necessary for the existence of multi-strand reconstruction codes with non-vanishing rates. We then present two constructions of such codes and show that their rates approach 1 for values of ℓ that asymptotically behave like the lower bound.</description><subject>Asymptotic properties</subject><subject>Codes</subject><subject>Data storage</subject><subject>Gene sequencing</subject><subject>Lower bounds</subject><subject>Polymers</subject><subject>Radio frequency</subject><subject>Reconstruction</subject><subject>Redundancy</subject><subject>Sequential analysis</subject><subject>Strings</subject><subject>Symbols</subject><subject>Technological innovation</subject><subject>Upper bounds</subject><issn>0018-9448</issn><issn>1557-9654</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNkDFPwzAQhS0EEqGwMzBEYk45O_bZGVEFpVIlJGhnK3XOKFWbFDsZ4Nfjqh2YTnf33r3Tx9g9hynnUD2tFqupAFFOS4EVF_KCZVwpXVSo5CXLALgpKinNNbuJcZtaqbjIGJ9TR6Hetb_U5Ouu_R4p_yDXd3EIoxvavst96Pf557hJk7b7irfsyte7SHfnOmHr15fV7K1Yvs8Xs-dl4QTooVACS9mAER4FNlqlaNdQJTcIJJAcGQ9lowEAPaJ2aQXGKCQuq4YrX07Y4-nuIfTpqzjYbT-GLkVaYZQSGrXhSQUnlQt9jIG8PYR2X4cfy8EewdgExh7B2DOYZHk4WVoi-ifngFLr8g903F1y</recordid><startdate>20230901</startdate><enddate>20230901</enddate><creator>Yehezkeally, Yonatan</creator><creator>Bar-Lev, Daniella</creator><creator>Marcovich, Sagi</creator><creator>Yaakobi, Eitan</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0003-4165-2024</orcidid><orcidid>https://orcid.org/0000-0003-1652-9761</orcidid><orcidid>https://orcid.org/0000-0001-6766-1450</orcidid><orcidid>https://orcid.org/0000-0002-9851-5234</orcidid></search><sort><creationdate>20230901</creationdate><title>Generalized Unique Reconstruction from Substrings</title><author>Yehezkeally, Yonatan ; Bar-Lev, Daniella ; Marcovich, Sagi ; Yaakobi, Eitan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c207t-52634d082f626d75448cde94b60e26ece8f03d70006f667ce9408856e149d15f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Asymptotic properties</topic><topic>Codes</topic><topic>Data storage</topic><topic>Gene sequencing</topic><topic>Lower bounds</topic><topic>Polymers</topic><topic>Radio frequency</topic><topic>Reconstruction</topic><topic>Redundancy</topic><topic>Sequential analysis</topic><topic>Strings</topic><topic>Symbols</topic><topic>Technological innovation</topic><topic>Upper bounds</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Yehezkeally, Yonatan</creatorcontrib><creatorcontrib>Bar-Lev, Daniella</creatorcontrib><creatorcontrib>Marcovich, Sagi</creatorcontrib><creatorcontrib>Yaakobi, Eitan</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on information theory</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Yehezkeally, Yonatan</au><au>Bar-Lev, Daniella</au><au>Marcovich, Sagi</au><au>Yaakobi, Eitan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Generalized Unique Reconstruction from Substrings</atitle><jtitle>IEEE transactions on information theory</jtitle><stitle>TIT</stitle><date>2023-09-01</date><risdate>2023</risdate><volume>69</volume><issue>9</issue><spage>1</spage><epage>1</epage><pages>1-1</pages><issn>0018-9448</issn><eissn>1557-9654</eissn><coden>IETTAW</coden><abstract>This paper introduces a new family of reconstruction codes which is motivated by applications in DNA data storage and sequencing. In such applications, DNA strands are sequenced by reading some subset of their substrings. While previous works considered two extreme cases in which all substrings of predefined lengths are read or substrings are read with no overlap for the single string case, this work studies two extensions of this paradigm. The first extension considers the setup in which consecutive substrings are read with some given minimum overlap. First, an upper bound is provided on the attainable rates of codes that guarantee unique reconstruction. Then, efficient constructions of codes that asymptotically meet that upper bound are presented. In the second extension, we study the setup where multiple strings are reconstructed together. Given the number of strings and their length, we first derive a lower bound on the read substrings' length ℓ that is necessary for the existence of multi-strand reconstruction codes with non-vanishing rates. We then present two constructions of such codes and show that their rates approach 1 for values of ℓ that asymptotically behave like the lower bound.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TIT.2023.3269124</doi><tpages>1</tpages><orcidid>https://orcid.org/0000-0003-4165-2024</orcidid><orcidid>https://orcid.org/0000-0003-1652-9761</orcidid><orcidid>https://orcid.org/0000-0001-6766-1450</orcidid><orcidid>https://orcid.org/0000-0002-9851-5234</orcidid></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 0018-9448
ispartof IEEE transactions on information theory, 2023-09, Vol.69 (9), p.1-1
issn 0018-9448
1557-9654
language eng
recordid cdi_crossref_primary_10_1109_TIT_2023_3269124
source IEEE Electronic Library (IEL)
subjects Asymptotic properties
Codes
Data storage
Gene sequencing
Lower bounds
Polymers
Radio frequency
Reconstruction
Redundancy
Sequential analysis
Strings
Symbols
Technological innovation
Upper bounds
title Generalized Unique Reconstruction from Substrings
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-23T16%3A44%3A30IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Generalized%20Unique%20Reconstruction%20from%20Substrings&rft.jtitle=IEEE%20transactions%20on%20information%20theory&rft.au=Yehezkeally,%20Yonatan&rft.date=2023-09-01&rft.volume=69&rft.issue=9&rft.spage=1&rft.epage=1&rft.pages=1-1&rft.issn=0018-9448&rft.eissn=1557-9654&rft.coden=IETTAW&rft_id=info:doi/10.1109/TIT.2023.3269124&rft_dat=%3Cproquest_RIE%3E2855276781%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2855276781&rft_id=info:pmid/&rft_ieee_id=10106477&rfr_iscdi=true