Correcting a substring edit error of bounded length

Localized errors, which occur in windows with bounded lengths, are common in a range of applications. Such errors can be modeled as k-substring edits , which replace one substring with another string, both with lengths upper bounded by k . This generalizes errors such as localized deletions or burst...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on communications 2024-07, p.1-1
Hauptverfasser: Tang, Yuanyuan, Motamen, Sarvin, Lou, Hao, Whritenour, Kallie, Wang, Shuche, Gabrys, Ryan, Farnoud, Farzad
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Localized errors, which occur in windows with bounded lengths, are common in a range of applications. Such errors can be modeled as k-substring edits , which replace one substring with another string, both with lengths upper bounded by k . This generalizes errors such as localized deletions or burst substitutions studied in the literature. In this paper, we show through statistical analysis of real data that substring edits better describe differences between related documents compared to independent edits, and thus commonly arise in problems related to data synchronization. We also show that for the dataset under study, assuming codes exist that can achieve the Gilbert-Varshamov (GV) bound, substring-edit-correcting codes can synchronize two documents with much lower overhead compared to general indel/substitution-correcting codes. Furthermore, given a constant k , we construct binary codes of length n for correcting a single k -substring edit that achieves the GV bound and subsequently has redundancy of asymptotically 2 log n , compared to 4 k log n , the lowest redundancy achievable by an existing code for this problem. The time complexities of both encoding and decoding are polynomial with respect to n .
ISSN:0090-6778
1558-0857
DOI:10.1109/TCOMM.2024.3420721