Minimal Absent Words on Run-Length Encoded Strings
A string $w$ is called a minimal absent word (MAW) for another string $T$ if $w$ does not occur (as a substring) in $T$ and any proper substring of $w$ occurs in $T$. State-of-the-art data structures for reporting the set $\mathsf{MAW}(T)$ of MAWs from a given string $T$ of length $n$ require $O(n)$...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | A string $w$ is called a minimal absent word (MAW) for another string $T$ if
$w$ does not occur (as a substring) in $T$ and any proper substring of $w$
occurs in $T$. State-of-the-art data structures for reporting the set
$\mathsf{MAW}(T)$ of MAWs from a given string $T$ of length $n$ require $O(n)$
space, can be built in $O(n)$ time, and can report all MAWs in
$O(|\mathsf{MAW}(T)|)$ time upon a query. This paper initiates the problem of
computing MAWs from a compressed representation of a string. In particular, we
focus on the most basic compressed representation of a string, run-length
encoding (RLE), which represents each maximal run of the same characters $a$ by
$a^p$ where $p$ is the length of the run. Let $m$ be the RLE-size of string
$T$. After categorizing the MAWs into five disjoint sets $\mathcal{M}_1$,
$\mathcal{M}_2$, $\mathcal{M}_3$, $\mathcal{M}_4$, $\mathcal{M}_5$ using RLE,
we present matching upper and lower bounds for the number of MAWs in
$\mathcal{M}_i$ for $i = 1,2,4,5$ in terms of RLE-size $m$, except for
$\mathcal{M}_3$ whose size is unbounded by $m$. We then present a compact
$O(m)$-space data structure that can report all MAWs in optimal
$O(|\mathsf{MAW}(T)|)$ time. |
---|---|
DOI: | 10.48550/arxiv.2202.13591 |