Linear Time Membership in a Class of Regular Expressions with Counting, Interleaving, and Unordered Concatenation
Regular Expressions (REs) are ubiquitous in database and programming languages. While many applications make use of REs extended with interleaving ( shuffle ) and unordered concatenation operators, this extension badly affects the complexity of basic operations, and, especially, makes membership NP-...
Gespeichert in:
Veröffentlicht in: | ACM transactions on database systems 2017-12, Vol.42 (4), p.1-44 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Regular Expressions (REs) are ubiquitous in database and programming languages. While many applications make use of REs extended with
interleaving
(
shuffle
) and
unordered concatenation
operators, this extension badly affects the complexity of basic operations, and, especially, makes
membership
NP-hard, which is unacceptable in most practical scenarios.
In this article, we study the problem of membership checking for a restricted class of these extended REs, called
conflict-free REs
, which are expressive enough to cover the vast majority of real-world applications. We present several polynomial algorithms for membership checking over conflict-free REs. The algorithms are all polynomial and differ in terms of adopted optimization techniques and in the kind of supported operators. As a particular application, we generalize the approach to check membership of Extensible Markup Language trees into a class of EDTDs (Extended Document Type Definitions) that models the crucial aspects of DTDs (Document Type Definitions) and XSD (XML Schema Definitions) schemas.
Results about an extensive experimental analysis validate the efficiency of the presented membership checking techniques. |
---|---|
ISSN: | 0362-5915 1557-4644 |
DOI: | 10.1145/3132701 |