Error-Correcting Codes for Nanopore Sequencing

Nanopore sequencing, superior to other sequencing technologies for DNA storage in multiple aspects, has recently attracted considerable attention. Its high error rates, however, demand thorough research on practical and efficient coding schemes to enable accurate recovery of stored data. To this end...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on information theory 2024-07, Vol.70 (7), p.4956-4967
Hauptverfasser: Banerjee, Anisha, Yehezkeally, Yonatan, Wachter-Zeh, Antonia, Yaakobi, Eitan
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Nanopore sequencing, superior to other sequencing technologies for DNA storage in multiple aspects, has recently attracted considerable attention. Its high error rates, however, demand thorough research on practical and efficient coding schemes to enable accurate recovery of stored data. To this end, we consider a simplified model of a nanopore sequencer inspired by Mao et al., incorporating intersymbol interference and measurement noise. Essentially, our channel model passes a sliding window of length \ell over a q -ary input sequence that outputs the composition of the enclosed \ell bits, and shifts by \delta positions with each time step. In this context, the composition of a q-ary vector {\boldsymbol x} specifies the number of occurrences in {\boldsymbol x} of each symbol in \lbrace 0,1,\ldots, q-1\rbrace . The resulting compositions vector, termed the read vector, may also be corrupted by t substitution errors. By employing graph-theoretic techniques, we deduce that for \delta =1 , at least \log \log n symbols of redundancy are required to correct a single ( t=1 ) substitution. Finally, for \ell \geq 3 , we exploit some inherent characteristics of read vectors to arrive at an error-correcting code that is of optimal redundancy up to a (small) additive constant for this setting. This construction is also found to be optimal for the case of reconstruction from two noisy read vectors.
ISSN:0018-9448
1557-9654
DOI:10.1109/TIT.2024.3380615