Linear Time Membership in a Class of Regular Expressions with Counting, Interleaving, and Unordered Concatenation

Regular Expressions (REs) are ubiquitous in database and programming languages. While many applications make use of REs extended with interleaving ( shuffle ) and unordered concatenation operators, this extension badly affects the complexity of basic operations, and, especially, makes membership NP-...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:ACM transactions on database systems 2017-12, Vol.42 (4), p.1-44
Hauptverfasser: Colazzo, Dario, Ghelli, Giorgio, Sartiani, Carlo
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Regular Expressions (REs) are ubiquitous in database and programming languages. While many applications make use of REs extended with interleaving ( shuffle ) and unordered concatenation operators, this extension badly affects the complexity of basic operations, and, especially, makes membership NP-hard, which is unacceptable in most practical scenarios. In this article, we study the problem of membership checking for a restricted class of these extended REs, called conflict-free REs , which are expressive enough to cover the vast majority of real-world applications. We present several polynomial algorithms for membership checking over conflict-free REs. The algorithms are all polynomial and differ in terms of adopted optimization techniques and in the kind of supported operators. As a particular application, we generalize the approach to check membership of Extensible Markup Language trees into a class of EDTDs (Extended Document Type Definitions) that models the crucial aspects of DTDs (Document Type Definitions) and XSD (XML Schema Definitions) schemas. Results about an extensive experimental analysis validate the efficiency of the presented membership checking techniques.
ISSN:0362-5915
1557-4644
DOI:10.1145/3132701