A Data-Structure for Approximate Longest Common Subsequence of A Set of Strings
Given a set of $k$ strings $I$, their longest common subsequence (LCS) is the string with the maximum length that is a subset of all the strings in $I$. A data-structure for this problem preprocesses $I$ into a data-structure such that the LCS of a set of query strings $Q$ with the strings of $I$ ca...
Gespeichert in:
1. Verfasser: | |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Given a set of $k$ strings $I$, their longest common subsequence (LCS) is the
string with the maximum length that is a subset of all the strings in $I$. A
data-structure for this problem preprocesses $I$ into a data-structure such
that the LCS of a set of query strings $Q$ with the strings of $I$ can be
computed faster. Since the problem is NP-hard for arbitrary $k$, we allow an
error that allows some characters to be replaced by other characters. We define
the approximation version of the problem with an extra input $m$, which is the
length of the regular expression (regex) that describes the input, and the
approximation factor is the logarithm of the number of possibilities in the
regex returned by the algorithm, divided by the logarithm regex with the
minimum number of possibilities. Then, we use a tree data-structure to achieve
sublinear-time LCS queries. We also explain how the idea can be extended to the
longest increasing subsequence (LIS) problem. |
---|---|
DOI: | 10.48550/arxiv.2008.01768 |