High-Performance Row Pattern Recognition Using Joins

The SQL standard introduced MATCH_RECOGNIZE in 2016 for row pattern recognition. Since then, MATCH_RECOGNIZE has been supported by several leading relation systems, they implemented this function using Non-Deterministic Finite Automaton (NFA). While NFA is suitable for pattern recognition in streami...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Proceedings of the VLDB Endowment 2023-01, Vol.16 (5), p.1181-1195
Hauptverfasser: Zhu, Erkang, Huang, Silu, Chaudhuri, Surajit
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1195
container_issue 5
container_start_page 1181
container_title Proceedings of the VLDB Endowment
container_volume 16
creator Zhu, Erkang
Huang, Silu
Chaudhuri, Surajit
description The SQL standard introduced MATCH_RECOGNIZE in 2016 for row pattern recognition. Since then, MATCH_RECOGNIZE has been supported by several leading relation systems, they implemented this function using Non-Deterministic Finite Automaton (NFA). While NFA is suitable for pattern recognition in streaming scenarios, the current uses of NFA by the relational systems for historical data analysis scenarios overlook important optimization opportunities. We propose a new approach to use Join to speed up row pattern recognition in historical analysis scenarios for relational systems. Implemented as a logical plan rewrite rule, the new approach first filters the input relation to MATCH_RECOGNIZE using Joins constructed based on a subset of symbols taken from the PATTERN expression, then run the NFA-based MATCH_RECOGNIZE on the filtered rows, reducing the net cost. The rule also includes a specialized cardinality model for the Joins and a cost model for the NFA-based MATCH_RECOGNIZE operator for choosing an appropriate symbol set. The rewrite rule is applicable when the query pattern's definition is self-contained and either the input table has no duplicates or there is a window condition. Applying the rewrite rule to a query benchmark with 1,800 queries spanning over 6 patterns and 3 pattern definitions, we observed median speedups of 5.4X on Trino (v373 with ORC files on Hive), 57.5X on SQL Server (2019) using column store and 41.6X on row store.
doi_str_mv 10.14778/3579075.3579090
format Article
fullrecord <record><control><sourceid>crossref</sourceid><recordid>TN_cdi_crossref_primary_10_14778_3579075_3579090</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_14778_3579075_3579090</sourcerecordid><originalsourceid>FETCH-LOGICAL-c243t-7637082477ab8089ab2625c0741d41c6151038c0641737447e766385c7fa7ee73</originalsourceid><addsrcrecordid>eNpNj81KAzEURoMoWKt7l3mB1Jvfm1lKUasULMWuQxozY8QmkgyIb6_UWbg63-rjHEKuOSy4QrQ3UmMHqBdHdnBCZoJrYBY6PP23z8lFa-8AxhpuZ0St0vDGNrH2pR58DpFuyxfd-HGMNdNtDGXIaUwl011LeaBPJeV2Sc56_9Hi1cQ52d3fvSxXbP388Li8XbMglBwZGolgxa-d31uwnd8LI3QAVPxV8WC45iBtAKM4SlQKIxojrQ7Ye4wR5ZzA32-opbUae_dZ08HXb8fBHavdVO2mavkDsoVHiQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>High-Performance Row Pattern Recognition Using Joins</title><source>ACM Digital Library Complete</source><creator>Zhu, Erkang ; Huang, Silu ; Chaudhuri, Surajit</creator><creatorcontrib>Zhu, Erkang ; Huang, Silu ; Chaudhuri, Surajit</creatorcontrib><description>The SQL standard introduced MATCH_RECOGNIZE in 2016 for row pattern recognition. Since then, MATCH_RECOGNIZE has been supported by several leading relation systems, they implemented this function using Non-Deterministic Finite Automaton (NFA). While NFA is suitable for pattern recognition in streaming scenarios, the current uses of NFA by the relational systems for historical data analysis scenarios overlook important optimization opportunities. We propose a new approach to use Join to speed up row pattern recognition in historical analysis scenarios for relational systems. Implemented as a logical plan rewrite rule, the new approach first filters the input relation to MATCH_RECOGNIZE using Joins constructed based on a subset of symbols taken from the PATTERN expression, then run the NFA-based MATCH_RECOGNIZE on the filtered rows, reducing the net cost. The rule also includes a specialized cardinality model for the Joins and a cost model for the NFA-based MATCH_RECOGNIZE operator for choosing an appropriate symbol set. The rewrite rule is applicable when the query pattern's definition is self-contained and either the input table has no duplicates or there is a window condition. Applying the rewrite rule to a query benchmark with 1,800 queries spanning over 6 patterns and 3 pattern definitions, we observed median speedups of 5.4X on Trino (v373 with ORC files on Hive), 57.5X on SQL Server (2019) using column store and 41.6X on row store.</description><identifier>ISSN: 2150-8097</identifier><identifier>EISSN: 2150-8097</identifier><identifier>DOI: 10.14778/3579075.3579090</identifier><language>eng</language><ispartof>Proceedings of the VLDB Endowment, 2023-01, Vol.16 (5), p.1181-1195</ispartof><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c243t-7637082477ab8089ab2625c0741d41c6151038c0641737447e766385c7fa7ee73</citedby><cites>FETCH-LOGICAL-c243t-7637082477ab8089ab2625c0741d41c6151038c0641737447e766385c7fa7ee73</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27901,27902</link.rule.ids></links><search><creatorcontrib>Zhu, Erkang</creatorcontrib><creatorcontrib>Huang, Silu</creatorcontrib><creatorcontrib>Chaudhuri, Surajit</creatorcontrib><title>High-Performance Row Pattern Recognition Using Joins</title><title>Proceedings of the VLDB Endowment</title><description>The SQL standard introduced MATCH_RECOGNIZE in 2016 for row pattern recognition. Since then, MATCH_RECOGNIZE has been supported by several leading relation systems, they implemented this function using Non-Deterministic Finite Automaton (NFA). While NFA is suitable for pattern recognition in streaming scenarios, the current uses of NFA by the relational systems for historical data analysis scenarios overlook important optimization opportunities. We propose a new approach to use Join to speed up row pattern recognition in historical analysis scenarios for relational systems. Implemented as a logical plan rewrite rule, the new approach first filters the input relation to MATCH_RECOGNIZE using Joins constructed based on a subset of symbols taken from the PATTERN expression, then run the NFA-based MATCH_RECOGNIZE on the filtered rows, reducing the net cost. The rule also includes a specialized cardinality model for the Joins and a cost model for the NFA-based MATCH_RECOGNIZE operator for choosing an appropriate symbol set. The rewrite rule is applicable when the query pattern's definition is self-contained and either the input table has no duplicates or there is a window condition. Applying the rewrite rule to a query benchmark with 1,800 queries spanning over 6 patterns and 3 pattern definitions, we observed median speedups of 5.4X on Trino (v373 with ORC files on Hive), 57.5X on SQL Server (2019) using column store and 41.6X on row store.</description><issn>2150-8097</issn><issn>2150-8097</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><recordid>eNpNj81KAzEURoMoWKt7l3mB1Jvfm1lKUasULMWuQxozY8QmkgyIb6_UWbg63-rjHEKuOSy4QrQ3UmMHqBdHdnBCZoJrYBY6PP23z8lFa-8AxhpuZ0St0vDGNrH2pR58DpFuyxfd-HGMNdNtDGXIaUwl011LeaBPJeV2Sc56_9Hi1cQ52d3fvSxXbP388Li8XbMglBwZGolgxa-d31uwnd8LI3QAVPxV8WC45iBtAKM4SlQKIxojrQ7Ye4wR5ZzA32-opbUae_dZ08HXb8fBHavdVO2mavkDsoVHiQ</recordid><startdate>202301</startdate><enddate>202301</enddate><creator>Zhu, Erkang</creator><creator>Huang, Silu</creator><creator>Chaudhuri, Surajit</creator><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>202301</creationdate><title>High-Performance Row Pattern Recognition Using Joins</title><author>Zhu, Erkang ; Huang, Silu ; Chaudhuri, Surajit</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c243t-7637082477ab8089ab2625c0741d41c6151038c0641737447e766385c7fa7ee73</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zhu, Erkang</creatorcontrib><creatorcontrib>Huang, Silu</creatorcontrib><creatorcontrib>Chaudhuri, Surajit</creatorcontrib><collection>CrossRef</collection><jtitle>Proceedings of the VLDB Endowment</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Zhu, Erkang</au><au>Huang, Silu</au><au>Chaudhuri, Surajit</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>High-Performance Row Pattern Recognition Using Joins</atitle><jtitle>Proceedings of the VLDB Endowment</jtitle><date>2023-01</date><risdate>2023</risdate><volume>16</volume><issue>5</issue><spage>1181</spage><epage>1195</epage><pages>1181-1195</pages><issn>2150-8097</issn><eissn>2150-8097</eissn><abstract>The SQL standard introduced MATCH_RECOGNIZE in 2016 for row pattern recognition. Since then, MATCH_RECOGNIZE has been supported by several leading relation systems, they implemented this function using Non-Deterministic Finite Automaton (NFA). While NFA is suitable for pattern recognition in streaming scenarios, the current uses of NFA by the relational systems for historical data analysis scenarios overlook important optimization opportunities. We propose a new approach to use Join to speed up row pattern recognition in historical analysis scenarios for relational systems. Implemented as a logical plan rewrite rule, the new approach first filters the input relation to MATCH_RECOGNIZE using Joins constructed based on a subset of symbols taken from the PATTERN expression, then run the NFA-based MATCH_RECOGNIZE on the filtered rows, reducing the net cost. The rule also includes a specialized cardinality model for the Joins and a cost model for the NFA-based MATCH_RECOGNIZE operator for choosing an appropriate symbol set. The rewrite rule is applicable when the query pattern's definition is self-contained and either the input table has no duplicates or there is a window condition. Applying the rewrite rule to a query benchmark with 1,800 queries spanning over 6 patterns and 3 pattern definitions, we observed median speedups of 5.4X on Trino (v373 with ORC files on Hive), 57.5X on SQL Server (2019) using column store and 41.6X on row store.</abstract><doi>10.14778/3579075.3579090</doi><tpages>15</tpages></addata></record>
fulltext fulltext
identifier ISSN: 2150-8097
ispartof Proceedings of the VLDB Endowment, 2023-01, Vol.16 (5), p.1181-1195
issn 2150-8097
2150-8097
language eng
recordid cdi_crossref_primary_10_14778_3579075_3579090
source ACM Digital Library Complete
title High-Performance Row Pattern Recognition Using Joins
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-14T00%3A57%3A57IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=High-Performance%20Row%20Pattern%20Recognition%20Using%20Joins&rft.jtitle=Proceedings%20of%20the%20VLDB%20Endowment&rft.au=Zhu,%20Erkang&rft.date=2023-01&rft.volume=16&rft.issue=5&rft.spage=1181&rft.epage=1195&rft.pages=1181-1195&rft.issn=2150-8097&rft.eissn=2150-8097&rft_id=info:doi/10.14778/3579075.3579090&rft_dat=%3Ccrossref%3E10_14778_3579075_3579090%3C/crossref%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true