Limitations of Mean-Based Algorithms for Trace Reconstruction at Small Edit Distance

Trace reconstruction considers the task of recovering an unknown string \mathbf {x}\in \{0,1\}^{n} given a number of independent "traces", i.e., subsequences of \mathbf {x} obtained by randomly and independently deleting every symbol of \mathbf {x} with some probability p . The info...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on information theory 2022-10, Vol.68 (10), p.6790-6801
Hauptverfasser:	Grigorescu, Elena, Sudan, Madhu, Zhu, Minshen
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Codes complex analysis Complexity theory Decoding DNA Hamming distance Information theory Mathematical analysis mean-based algorithms multiplicity of zeros Number theory Polynomials Reconstruction Strings Task analysis Trace reconstruction Upper bound
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	6801
container_issue	10
container_start_page	6790
container_title	IEEE transactions on information theory
container_volume	68
creator	Grigorescu, Elena Sudan, Madhu Zhu, Minshen
description	Trace reconstruction considers the task of recovering an unknown string \mathbf {x}\in \{0,1\}^{n} given a number of independent "traces", i.e., subsequences of \mathbf {x} obtained by randomly and independently deleting every symbol of \mathbf {x} with some probability p . The information-theoretic limit of the number of traces needed to recover a string of length n is still unknown. This limit is essentially the same as the number of traces needed to determine, given strings \mathbf {x} and \mathbf {y} and traces of one of them, which string is the source. The most-studied class of algorithms for the worst-case version of the problem are "mean-based" algorithms. These are a restricted class of distinguishers that only use the mean value of each coordinate on the given samples. In this work we study limitations of mean-based algorithms on strings at small Hamming or edit distance. We show that, on the one hand, distinguishing strings that are nearby in Hamming distance is "easy" for such distinguishers. On the other hand, we show that distinguishing strings that are nearby in edit distance is "hard" for mean-based algorithms. Along the way, we also describe a connection to the famous Prouhet-Tarry-Escott (PTE) problem, which shows a barrier to finding explicit hard-to-distinguish strings: namely such strings would imply explicit short solutions to the PTE problem, a well-known difficult problem in number theory. Furthermore, we show that the converse is also true, thus, finding explicit solutions to the PTE problem is equivalent to the problem of finding explicit strings that are hard-to-distinguish by mean-based algorithms. Our techniques rely on complex analysis arguments that involve careful trigonometric estimates, and algebraic techniques that include applications of Descartes' rule of signs for polynomials over the reals.
doi_str_mv	10.1109/TIT.2022.3168624
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_ieee_primary_9759421</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9759421</ieee_id><sourcerecordid>2714954194</sourcerecordid><originalsourceid>FETCH-LOGICAL-c244t-aa9c1114eebe728cd30012c9a5a2641bb414f0fc668fa449e79bff3012bab18a3</originalsourceid><addsrcrecordid>eNo9kE1LAzEQhoMoWD_ugpeA562Z7OxHjrVWLVQEXc8hmyaash81SQ_-e1NaPA0Dz_vO8BByA2wKwMR9s2ymnHE-zaGsS44nZAJFUWWiLPCUTBiDOhOI9Tm5CGGTViyAT0izcr2LKrpxCHS09NWoIXtQwazprPsavYvffaB29LTxShv6bnQio9_pfYSqSD961XV0sXaRProQ1aDNFTmzqgvm-jgvyefTopm_ZKu35-V8tso0R4yZUkIDABrTmorXep2nt7gWqlC8RGhbBLTM6rKsrUIUphKttXliWtVCrfJLcnfo3frxZ2dClJtx54d0UvIKUBQIAhPFDpT2YwjeWLn1rlf-VwKTe3cyuZN7d_LoLkVuDxFnjPnHRVUI5JD_ASOJamk</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2714954194</pqid></control><display><type>article</type><title>Limitations of Mean-Based Algorithms for Trace Reconstruction at Small Edit Distance</title><source>IEEE Electronic Library (IEL)</source><creator>Grigorescu, Elena ; Sudan, Madhu ; Zhu, Minshen</creator><creatorcontrib>Grigorescu, Elena ; Sudan, Madhu ; Zhu, Minshen</creatorcontrib><description><![CDATA[Trace reconstruction considers the task of recovering an unknown string <inline-formula> <tex-math notation="LaTeX">\mathbf {x}\in \{0,1\}^{n} </tex-math></inline-formula> given a number of independent "traces", i.e., subsequences of <inline-formula> <tex-math notation="LaTeX">\mathbf {x} </tex-math></inline-formula> obtained by randomly and independently deleting every symbol of <inline-formula> <tex-math notation="LaTeX">\mathbf {x} </tex-math></inline-formula> with some probability <inline-formula> <tex-math notation="LaTeX">p </tex-math></inline-formula>. The information-theoretic limit of the number of traces needed to recover a string of length <inline-formula> <tex-math notation="LaTeX">n </tex-math></inline-formula> is still unknown. This limit is essentially the same as the number of traces needed to determine, given strings <inline-formula> <tex-math notation="LaTeX">\mathbf {x} </tex-math></inline-formula> and <inline-formula> <tex-math notation="LaTeX">\mathbf {y} </tex-math></inline-formula> and traces of one of them, which string is the source. The most-studied class of algorithms for the worst-case version of the problem are "mean-based" algorithms. These are a restricted class of distinguishers that only use the mean value of each coordinate on the given samples. In this work we study limitations of mean-based algorithms on strings at small Hamming or edit distance. We show that, on the one hand, distinguishing strings that are nearby in Hamming distance is "easy" for such distinguishers. On the other hand, we show that distinguishing strings that are nearby in edit distance is "hard" for mean-based algorithms. Along the way, we also describe a connection to the famous Prouhet-Tarry-Escott (PTE) problem, which shows a barrier to finding explicit hard-to-distinguish strings: namely such strings would imply explicit short solutions to the PTE problem, a well-known difficult problem in number theory. Furthermore, we show that the converse is also true, thus, finding explicit solutions to the PTE problem is equivalent to the problem of finding explicit strings that are hard-to-distinguish by mean-based algorithms. Our techniques rely on complex analysis arguments that involve careful trigonometric estimates, and algebraic techniques that include applications of Descartes' rule of signs for polynomials over the reals.]]></description><identifier>ISSN: 0018-9448</identifier><identifier>EISSN: 1557-9654</identifier><identifier>DOI: 10.1109/TIT.2022.3168624</identifier><identifier>CODEN: IETTAW</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Algorithms ; Codes ; complex analysis ; Complexity theory ; Decoding ; DNA ; Hamming distance ; Information theory ; Mathematical analysis ; mean-based algorithms ; multiplicity of zeros ; Number theory ; Polynomials ; Reconstruction ; Strings ; Task analysis ; Trace reconstruction ; Upper bound</subject><ispartof>IEEE transactions on information theory, 2022-10, Vol.68 (10), p.6790-6801</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c244t-aa9c1114eebe728cd30012c9a5a2641bb414f0fc668fa449e79bff3012bab18a3</cites><orcidid>0000-0003-1927-6085 ; 0000-0003-3718-6489</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9759421$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9759421$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Grigorescu, Elena</creatorcontrib><creatorcontrib>Sudan, Madhu</creatorcontrib><creatorcontrib>Zhu, Minshen</creatorcontrib><title>Limitations of Mean-Based Algorithms for Trace Reconstruction at Small Edit Distance</title><title>IEEE transactions on information theory</title><addtitle>TIT</addtitle><description><![CDATA[Trace reconstruction considers the task of recovering an unknown string <inline-formula> <tex-math notation="LaTeX">\mathbf {x}\in \{0,1\}^{n} </tex-math></inline-formula> given a number of independent "traces", i.e., subsequences of <inline-formula> <tex-math notation="LaTeX">\mathbf {x} </tex-math></inline-formula> obtained by randomly and independently deleting every symbol of <inline-formula> <tex-math notation="LaTeX">\mathbf {x} </tex-math></inline-formula> with some probability <inline-formula> <tex-math notation="LaTeX">p </tex-math></inline-formula>. The information-theoretic limit of the number of traces needed to recover a string of length <inline-formula> <tex-math notation="LaTeX">n </tex-math></inline-formula> is still unknown. This limit is essentially the same as the number of traces needed to determine, given strings <inline-formula> <tex-math notation="LaTeX">\mathbf {x} </tex-math></inline-formula> and <inline-formula> <tex-math notation="LaTeX">\mathbf {y} </tex-math></inline-formula> and traces of one of them, which string is the source. The most-studied class of algorithms for the worst-case version of the problem are "mean-based" algorithms. These are a restricted class of distinguishers that only use the mean value of each coordinate on the given samples. In this work we study limitations of mean-based algorithms on strings at small Hamming or edit distance. We show that, on the one hand, distinguishing strings that are nearby in Hamming distance is "easy" for such distinguishers. On the other hand, we show that distinguishing strings that are nearby in edit distance is "hard" for mean-based algorithms. Along the way, we also describe a connection to the famous Prouhet-Tarry-Escott (PTE) problem, which shows a barrier to finding explicit hard-to-distinguish strings: namely such strings would imply explicit short solutions to the PTE problem, a well-known difficult problem in number theory. Furthermore, we show that the converse is also true, thus, finding explicit solutions to the PTE problem is equivalent to the problem of finding explicit strings that are hard-to-distinguish by mean-based algorithms. Our techniques rely on complex analysis arguments that involve careful trigonometric estimates, and algebraic techniques that include applications of Descartes' rule of signs for polynomials over the reals.]]></description><subject>Algorithms</subject><subject>Codes</subject><subject>complex analysis</subject><subject>Complexity theory</subject><subject>Decoding</subject><subject>DNA</subject><subject>Hamming distance</subject><subject>Information theory</subject><subject>Mathematical analysis</subject><subject>mean-based algorithms</subject><subject>multiplicity of zeros</subject><subject>Number theory</subject><subject>Polynomials</subject><subject>Reconstruction</subject><subject>Strings</subject><subject>Task analysis</subject><subject>Trace reconstruction</subject><subject>Upper bound</subject><issn>0018-9448</issn><issn>1557-9654</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kE1LAzEQhoMoWD_ugpeA562Z7OxHjrVWLVQEXc8hmyaash81SQ_-e1NaPA0Dz_vO8BByA2wKwMR9s2ymnHE-zaGsS44nZAJFUWWiLPCUTBiDOhOI9Tm5CGGTViyAT0izcr2LKrpxCHS09NWoIXtQwazprPsavYvffaB29LTxShv6bnQio9_pfYSqSD961XV0sXaRProQ1aDNFTmzqgvm-jgvyefTopm_ZKu35-V8tso0R4yZUkIDABrTmorXep2nt7gWqlC8RGhbBLTM6rKsrUIUphKttXliWtVCrfJLcnfo3frxZ2dClJtx54d0UvIKUBQIAhPFDpT2YwjeWLn1rlf-VwKTe3cyuZN7d_LoLkVuDxFnjPnHRVUI5JD_ASOJamk</recordid><startdate>20221001</startdate><enddate>20221001</enddate><creator>Grigorescu, Elena</creator><creator>Sudan, Madhu</creator><creator>Zhu, Minshen</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0003-1927-6085</orcidid><orcidid>https://orcid.org/0000-0003-3718-6489</orcidid></search><sort><creationdate>20221001</creationdate><title>Limitations of Mean-Based Algorithms for Trace Reconstruction at Small Edit Distance</title><author>Grigorescu, Elena ; Sudan, Madhu ; Zhu, Minshen</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c244t-aa9c1114eebe728cd30012c9a5a2641bb414f0fc668fa449e79bff3012bab18a3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Algorithms</topic><topic>Codes</topic><topic>complex analysis</topic><topic>Complexity theory</topic><topic>Decoding</topic><topic>DNA</topic><topic>Hamming distance</topic><topic>Information theory</topic><topic>Mathematical analysis</topic><topic>mean-based algorithms</topic><topic>multiplicity of zeros</topic><topic>Number theory</topic><topic>Polynomials</topic><topic>Reconstruction</topic><topic>Strings</topic><topic>Task analysis</topic><topic>Trace reconstruction</topic><topic>Upper bound</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Grigorescu, Elena</creatorcontrib><creatorcontrib>Sudan, Madhu</creatorcontrib><creatorcontrib>Zhu, Minshen</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on information theory</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Grigorescu, Elena</au><au>Sudan, Madhu</au><au>Zhu, Minshen</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Limitations of Mean-Based Algorithms for Trace Reconstruction at Small Edit Distance</atitle><jtitle>IEEE transactions on information theory</jtitle><stitle>TIT</stitle><date>2022-10-01</date><risdate>2022</risdate><volume>68</volume><issue>10</issue><spage>6790</spage><epage>6801</epage><pages>6790-6801</pages><issn>0018-9448</issn><eissn>1557-9654</eissn><coden>IETTAW</coden><abstract><![CDATA[Trace reconstruction considers the task of recovering an unknown string <inline-formula> <tex-math notation="LaTeX">\mathbf {x}\in \{0,1\}^{n} </tex-math></inline-formula> given a number of independent "traces", i.e., subsequences of <inline-formula> <tex-math notation="LaTeX">\mathbf {x} </tex-math></inline-formula> obtained by randomly and independently deleting every symbol of <inline-formula> <tex-math notation="LaTeX">\mathbf {x} </tex-math></inline-formula> with some probability <inline-formula> <tex-math notation="LaTeX">p </tex-math></inline-formula>. The information-theoretic limit of the number of traces needed to recover a string of length <inline-formula> <tex-math notation="LaTeX">n </tex-math></inline-formula> is still unknown. This limit is essentially the same as the number of traces needed to determine, given strings <inline-formula> <tex-math notation="LaTeX">\mathbf {x} </tex-math></inline-formula> and <inline-formula> <tex-math notation="LaTeX">\mathbf {y} </tex-math></inline-formula> and traces of one of them, which string is the source. The most-studied class of algorithms for the worst-case version of the problem are "mean-based" algorithms. These are a restricted class of distinguishers that only use the mean value of each coordinate on the given samples. In this work we study limitations of mean-based algorithms on strings at small Hamming or edit distance. We show that, on the one hand, distinguishing strings that are nearby in Hamming distance is "easy" for such distinguishers. On the other hand, we show that distinguishing strings that are nearby in edit distance is "hard" for mean-based algorithms. Along the way, we also describe a connection to the famous Prouhet-Tarry-Escott (PTE) problem, which shows a barrier to finding explicit hard-to-distinguish strings: namely such strings would imply explicit short solutions to the PTE problem, a well-known difficult problem in number theory. Furthermore, we show that the converse is also true, thus, finding explicit solutions to the PTE problem is equivalent to the problem of finding explicit strings that are hard-to-distinguish by mean-based algorithms. Our techniques rely on complex analysis arguments that involve careful trigonometric estimates, and algebraic techniques that include applications of Descartes' rule of signs for polynomials over the reals.]]></abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TIT.2022.3168624</doi><tpages>12</tpages><orcidid>https://orcid.org/0000-0003-1927-6085</orcidid><orcidid>https://orcid.org/0000-0003-3718-6489</orcidid></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 0018-9448
ispartof	IEEE transactions on information theory, 2022-10, Vol.68 (10), p.6790-6801
issn	0018-9448 1557-9654
language	eng
recordid	cdi_ieee_primary_9759421
source	IEEE Electronic Library (IEL)
subjects	Algorithms Codes complex analysis Complexity theory Decoding DNA Hamming distance Information theory Mathematical analysis mean-based algorithms multiplicity of zeros Number theory Polynomials Reconstruction Strings Task analysis Trace reconstruction Upper bound
title	Limitations of Mean-Based Algorithms for Trace Reconstruction at Small Edit Distance
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-06T07%3A15%3A33IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Limitations%20of%20Mean-Based%20Algorithms%20for%20Trace%20Reconstruction%20at%20Small%20Edit%20Distance&rft.jtitle=IEEE%20transactions%20on%20information%20theory&rft.au=Grigorescu,%20Elena&rft.date=2022-10-01&rft.volume=68&rft.issue=10&rft.spage=6790&rft.epage=6801&rft.pages=6790-6801&rft.issn=0018-9448&rft.eissn=1557-9654&rft.coden=IETTAW&rft_id=info:doi/10.1109/TIT.2022.3168624&rft_dat=%3Cproquest_RIE%3E2714954194%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2714954194&rft_id=info:pmid/&rft_ieee_id=9759421&rfr_iscdi=true