On the Maximum Number of Non-Confusable Strings Evolving Under Short Tandem Duplications

The set of all \( q \)-ary strings that do not contain repeated substrings of length \( \leqslant\! 3 \) (i.e., that do not contain substrings of the form \( a a \), \( a b a b \), and \( a b c a b c \)) constitutes a code correcting an arbitrary number of tandem-duplication mutations of length \( \...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:arXiv.org 2022-04
1. Verfasser: Kovačević, Mladen
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The set of all \( q \)-ary strings that do not contain repeated substrings of length \( \leqslant\! 3 \) (i.e., that do not contain substrings of the form \( a a \), \( a b a b \), and \( a b c a b c \)) constitutes a code correcting an arbitrary number of tandem-duplication mutations of length \( \leqslant\! 3 \). In other words, any two such strings are non-confusable in the sense that they cannot produce the same string while evolving under tandem duplications of length \( \leqslant\! 3 \). We demonstrate that this code is asymptotically optimal in terms of rate, meaning that it represents the largest set of non-confusable strings up to subexponential factors. This result settles the zero-error capacity problem for the last remaining case of tandem-duplication channels satisfying the "root-uniqueness" property.
ISSN:2331-8422
DOI:10.48550/arxiv.1911.06561