Correcting a substring edit error of bounded length
Localized errors, which occur in windows with bounded lengths, are common in a range of applications. Such errors can be modeled as k-substring edits , which replace one substring with another string, both with lengths upper bounded by k . This generalizes errors such as localized deletions or burst...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on communications 2024-07, p.1-1 |
---|---|
Hauptverfasser: | , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Localized errors, which occur in windows with bounded lengths, are common in a range of applications. Such errors can be modeled as k-substring edits , which replace one substring with another string, both with lengths upper bounded by k . This generalizes errors such as localized deletions or burst substitutions studied in the literature. In this paper, we show through statistical analysis of real data that substring edits better describe differences between related documents compared to independent edits, and thus commonly arise in problems related to data synchronization. We also show that for the dataset under study, assuming codes exist that can achieve the Gilbert-Varshamov (GV) bound, substring-edit-correcting codes can synchronize two documents with much lower overhead compared to general indel/substitution-correcting codes. Furthermore, given a constant k , we construct binary codes of length n for correcting a single k -substring edit that achieves the GV bound and subsequently has redundancy of asymptotically 2 log n , compared to 4 k log n , the lowest redundancy achievable by an existing code for this problem. The time complexities of both encoding and decoding are polynomial with respect to n . |
---|---|
ISSN: | 0090-6778 1558-0857 |
DOI: | 10.1109/TCOMM.2024.3420721 |