Efficient average-case population recovery in the presence of insertions and deletions

Several recent works have considered the \emph{trace reconstruction problem}, in which an unknown source string \(x\in\{0,1\}^n\) is transmitted through a probabilistic channel which may randomly delete coordinates or insert random bits, resulting in a \emph{trace} of \(x\). The goal is to reconstru...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2019-07
Hauptverfasser:	Ban, Frank, Chen, Xi, Servedio, Rocco A, Sinha, Sandip
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Complexity Dependence Polynomials Reconstruction Recovery Statistical analysis Strings
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Several recent works have considered the \emph{trace reconstruction problem}, in which an unknown source string \(x\in\{0,1\}^n\) is transmitted through a probabilistic channel which may randomly delete coordinates or insert random bits, resulting in a \emph{trace} of \(x\). The goal is to reconstruct the original string~\(x\) from independent traces of \(x\). While the best algorithms known for worst-case strings use \(\exp(O(n^{1/3}))\) traces \cite{DOS17,NazarovPeres17}, highly efficient algorithms are known \cite{PZ17,HPP18} for the \emph{average-case} version, in which \(x\) is uniformly random. We consider a generalization of this average-case trace reconstruction problem, which we call \emph{average-case population recovery in the presence of insertions and deletions}. In this problem, there is an unknown distribution \(\cal{D}\) over \(s\) unknown source strings \(x^1,\dots,x^s \in \{0,1\}^n\), and each sample is independently generated by drawing some \(x^i\) from \(\cal{D}\) and returning an independent trace of \(x^i\). Building on \cite{PZ17} and \cite{HPP18}, we give an efficient algorithm for this problem. For any support size \(s \leq \smash{\exp(\Theta(n^{1/3}))}\), for a \(1-o(1)\) fraction of all \(s\)-element support sets \(\{x^1,\dots,x^s\} \subset \{0,1\}^n\), for every distribution \(\cal{D}\) supported on \(\{x^1,\dots,x^s\}\), our algorithm efficiently recovers \({\cal D}\) up to total variation distance \(\epsilon\) with high probability, given access to independent traces of independent draws from \(\cal{D}\). The algorithm runs in time poly\((n,s,1/\epsilon)\) and its sample complexity is poly\((s,1/\epsilon,\exp(\log^{1/3}n)).\) This polynomial dependence on the support size \(s\) is in sharp contrast with the \emph{worst-case} version (when \(x^1,\dots,x^s\) may be any strings in \(\{0,1\}^n\)), in which the sample complexity of the most efficient known algorithm \cite{BCFSS19} is doubly exponential in \(s\).
ISSN:	2331-8422