Leveraging query logs and machine learning for parametric query optimization

Parametric query optimization (PQO) must address two problems: identify a relatively small number of plans to cache for a parameterized query (populateCache), and efficiently select the best cached plan to use for executing any instance of the parameterized query (getPlan). Our approach decouples th...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Proceedings of the VLDB Endowment 2021-11, Vol.15 (3), p.401-413
Hauptverfasser: Vaidya, Kapil, Dutt, Anshuman, Narasayya, Vivek, Chaudhuri, Surajit
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Parametric query optimization (PQO) must address two problems: identify a relatively small number of plans to cache for a parameterized query (populateCache), and efficiently select the best cached plan to use for executing any instance of the parameterized query (getPlan). Our approach decouples these two decisions. We formulate populateCache as an optimization problem with the goal of identifying a set of plans that minimizes the optimizer estimated cost of queries in the log, and present an efficient algorithm. For getPlan, we leverage query logs to train machine learning (ML) models to choose the lowest optimizer-estimated cost plan from the cached plans. We conduct extensive experiments using complex parameterized queries from benchmarks and real workloads. Our algorithm for populateCache achieves low geometric mean sub-optimality (1.2) even for complex queries using relatively few plans, and scales well to large query logs. The mean latency of our ML model based getPlan technique (~ 210μ sec ) is between one to four orders of magnitude faster compared to prior PQO techniques. The mean sub-optimality is low (1.05), and the 95 th percentile sub-optimality (1.3) is between 1.1× and 25× lower compared to prior techniques. Finally, we present an efficient algorithm for getPlan that leverages execution time information in query logs to circumvent inaccuracies of the query optimizer's cost estimates.
ISSN:2150-8097
2150-8097
DOI:10.14778/3494124.3494126